Assign6 (Fall-2025.Specifications.Assign6)

Programming (90%)

This assignment is a miniature of the (mini-)projects you'll be doing is the second half of the course. We'll be building a tiny programming language. As we've said, a programming language has three components: syntax, typing, and semantics. These three parts correspond to parsing, type checking and evaluation, respectively.

Syntax

Here is the syntax of the language you'll be implementing.

<expr> ::= True | False | <int> | <var>
         | ( + <expr> <expr> )
         | ( - <expr> <expr> )
         | ( * <expr> <expr> )
         | ( / <expr> <expr> )
         | ( <= <expr> <expr> )
         | ( If <expr> <expr> <expr> )
         | ( Let <var> <expr> <expr> )
  <ty> ::= int | bool

These syntax rules (formally called a BNF Grammar) tell us the following.

True is a well-formed expression.
False is a well-formed expression.
Integer literals are well-formed expressions. We will take an integer literal be any string s such that int_of_string_opt s is not None.
A variable is a well-formed expression. We require that variables are made up of all lower case letters. See the function is_valid_var in the starter code.
If e1 and e2 are well-formed expression then so are (+ e1 e2), (- e1 e2), ( * e1 e2), (/ e1 e2), and (<= e1 e2).
If e1, e2, and e3 are well-formed expressions the so is (If e1 e2 e3).
If x is a variable and e1 and e2 are well-formed expressions then so is (Let x e1 e2).

Well-formed expressions are S-expressions with a subset of atoms that are meaningful in our language. Parsing an expression in this language should be nearly identical to what we did in Assignment 4. The result of parsing a well-formed expression should be an expr as defined below (formally, values of type expr are abstract syntax trees for expression in our language).

type expr =

| True
| False
| Int of int
| Var of string
| Add of expr * expr
| Sub of expr * expr
| Mul of expr * expr
| Div of expr * expr
| Lte of expr * expr
| If of expr * expr * expr
| Let of string * expr * expr

val parse : string -> expr option

Implement the function parse so that parse s is Some e if s is a well-formed expression represented by e, and None otherwise. For example, parsing the expression

  (Let x 2
  (Let y (+ x x)
  (Let z (- ( * y x) 1)
  (Let q (If (<= z 0) (- 0 z) z)
  (/ q 3)))))

should result in Some e where e is

  Let ("x", Int 2,
  Let ("y", Add (Var "x", Var "x"),
  Let ("z", Sub (Mul (Var "y", Var "x"), Int 1),
  Let ("q", If (Lte (Var "z", Int 0), Sub (Int 0, Var "z"), Var "z"),
  Div ("q", Int 3)))))

Note that this expression can be written in OCaml syntax as

  let x = 2 in
  let y = x + x in
  let z = x * y - 1 in
  let q = if z <= 0 then 0 - z else z in
  q / 3

Typing

Here are the typing rules of our language. In them we introduce two types: int and bool.

\def\code#1{\textcolor{purple}{\texttt{#1}}} \def\side#1{\textcolor{green}{#1}} \frac {} {\Gamma \vdash \code{True} : \code{bool}} \text{(true)} \qquad \frac {} {\Gamma \vdash \code{False} : \code{bool}} \text{(false)} \qquad \frac {\side{n {\text{ is an int literal}}}} {\Gamma \vdash n : \code{int}} \text{(int)} \qquad \frac {\side{(x : t) \in \Gamma}} {\Gamma \vdash x : t} \text{(var)}

\def\code#1{\textcolor{purple}{\texttt{#1}}} \def\side#1{\textcolor{green}{#1}} \frac {\Gamma \vdash e_1 : \code{int} \quad \Gamma \vdash e_2 : \code{int}} {\Gamma \vdash \code{(} \ \code{+} \ e_1 \ e_2 \ \code{)} : \code{int}} \text{(add)} \qquad \frac {\Gamma \vdash e_1 : \code{int} \quad \Gamma \vdash e_2 : \code{int}} {\Gamma \vdash \code{(} \ \code{-} \ e_1 \ e_2 \ \code{)} : \code{int}} \text{(sub)} \qquad \frac {\Gamma \vdash e_1 : \code{int} \quad \Gamma \vdash e_2 : \code{int}} {\Gamma \vdash \code{(} \ \code{*} \ e_1 \ e_2 \ \code{)} : \code{int}} \text{(mul)} \qquad \frac {\Gamma \vdash e_1 : \code{int} \quad \Gamma \vdash e_2 : \code{int}} {\Gamma \vdash \code{(} \ \code{/} \ e_1 \ e_2 \ \code{)} : \code{int}} \text{(div)}

\def\code#1{\textcolor{purple}{\texttt{#1}}} \def\side#1{\textcolor{green}{#1}} \frac {\Gamma \vdash e_1 : \code{int} \quad \Gamma \vdash e_2 : \code{int}} {\Gamma \vdash \code{(} \ \code{<=} \ e_1 \ e_2 \ \code{)} : \code{bool}} \text{(lte)} \qquad \frac {\Gamma \vdash e_1 : \code{bool} \quad \Gamma \vdash e_2 : t \quad \Gamma \vdash e_3 : t} {\Gamma \vdash \code{(} \ \code{If} \ e_1 \ e_2 \ e_3 \ \code{)} : t} \text{(if)} \qquad \frac {\Gamma \vdash e_1 : t_1 \quad \Gamma, x : t_1 \vdash e_2 : t_2} {\Gamma \vdash \code{(} \ \code{Let} \ x \ e_1 \ e_2 \ \code{)} : t_2} \text{(let)}

type ty =

| IntTy
| BoolTy

We represent types in our interpreter using the ty ADT.

type ctxt = ty Stdlib.Map.Make(Stdlib.String).t

We represent contexts as maps (a.k.a. a dictionaries) defined using module functors (i.e., scary stuff we won't talk about in this course). The point is that a value of type ctxt is a map from variable names of type string to types of type ty. We provide a convenient interface for working with contexts below.

val empty_ctxt : ctxt

empty_ctxt is the empty context $\varnothing$ .

val add_binding : string -> ty -> ctxt -> ctxt

add_binding $x \ \tau \ \Gamma$ is the result of adding the binding $x : \tau$ to $\Gamma$ . This will be useful for implementing rules which have the context $\Gamma, x : \tau$ in their premises.

val check_binding : string -> ctxt -> ty option

check_binding $x \ \Gamma$ is Some $\tau$ if $(x : \tau) \in \Gamma$ and is None otherwise. This will be useful for implementing the (var) rule.

val type_of : ctxt -> expr -> ty option

Implement the function type_of so that type_of $\Gamma \ e$ is Some $\tau$ if $e$ is well-typed in $\Gamma$ (i.e., if $\Gamma \vdash e : \tau$ is derivable according to the above typing rules) and None otherwise. For example, the expression e

(Let x 2
(Let y (+ x x)
( * y y)))

is well-typed in the empty context, and type_of empty_ctxt e is Some IntTy, where as the expression e'

(Let x 2
(Let y (If x 1 -1)
( * y y)))

is not well-typed in the empty context (because x is being used as the condition of an If-expression) so type_of empty_ctxt e' is None.

Semantics

Here are the semantic rules of our language. In them we introduce two kinds of values: integer values and Boolean value ( $\top$ for truth and $\bot$ for falsity).

\def\code#1{\textcolor{purple}{\texttt{#1}}} \def\side#1{\textcolor{green}{#1}} \frac {} {\code{True} \Downarrow \top} \text{(trueEval)} \qquad \frac {} {\code{False} \Downarrow \bot} \text{(falseEval)} \qquad \frac {\side{n {\text{ is an int literal}}}} {n \Downarrow n} \text{(intEval)}

\def\code#1{\textcolor{purple}{\texttt{#1}}} \def\side#1{\textcolor{green}{#1}} \frac {e_1 \Downarrow v_1 \quad e_2 \Downarrow v_2 \quad \side{v = v_1 + v_2}} {\code{(} \ \code{+} \ e_1 \ e_2 \ \code{)} \Downarrow v} \text{(addEval)} \qquad \frac {e_1 \Downarrow v_1 \quad e_2 \Downarrow v_2 \quad \side{v = v_1 - v_2}} {\code{(} \ \code{-} \ e_1 \ e_2 \ \code{)} \Downarrow v} \text{(subEval)}

\def\code#1{\textcolor{purple}{\texttt{#1}}} \def\side#1{\textcolor{green}{#1}} \frac {e_1 \Downarrow v_1 \quad e_2 \Downarrow v_2 \quad \side{v = v_1v_2}} {\code{(} \ \code{*} \ e_1 \ e_2 \ \code{)} \Downarrow v} \text{(mulEval)} \qquad \frac {e_1 \Downarrow v_1 \quad e_2 \Downarrow v_2 \quad \side{v = v_1 / v_2}} {\code{(} \ \code{/} \ e_1 \ e_2 \ \code{)} \Downarrow v} \text{(divEval)}

\def\code#1{\textcolor{purple}{\texttt{#1}}} \def\side#1{\textcolor{green}{#1}} \frac {e_1 \Downarrow v_1 \quad e_2 \Downarrow v_2 \quad \side{v_1 \leq v_2}} {\code{(} \ \code{<=} \ e_1 \ e_2 \ \code{)} \Downarrow \top} \text{(lteTrue)} \qquad \frac {e_1 \Downarrow v_1 \quad e_2 \Downarrow v_2 \quad \side{v_1 > v_2}} {\code{(} \ \code{<=} \ e_1 \ e_2 \ \code{)} \Downarrow \bot} \text{(lteFalse)}

\def\code#1{\textcolor{purple}{\texttt{#1}}} \def\side#1{\textcolor{green}{#1}} \frac {e_1 \Downarrow \top \quad e_2 \Downarrow v_2} {\code{(} \ \code{If} \ e_1 \ e_2 \ e_3 \ \code{)} \Downarrow v_2} \text{(ifTrue)} \qquad \frac {e_1 \Downarrow \bot \quad e_3 \Downarrow v_3} {\code{(} \ \code{If} \ e_1 \ e_2 \ e_3 \ \code{)} \Downarrow v_3} \text{(ifFalse)} \qquad \frac {e_1 \Downarrow v_1 \quad \side{e = [v_1/x]e_2} \quad e \Downarrow v} {\code{(} \ \code{Let} \ x \ e_1 \ e_2 \ \code{)} \Downarrow v} \text{(let)}

type value =

| IntV of int
| BoolV of bool

We represent values with the value ADT.

val eval : expr -> value

Implement the function eval so that eval $e$ is the result of evaluating $e$ , i.e., it should be the value $v$ such that $e \Downarrow v$ is derivable. The whole point, of all of this, is that $e \Downarrow v$ is guaranteed to be derivable if $e$ is well-typed in the empty context, so we don't need to do any error handling here. These rules correspond to the rules for analogous OCaml syntax. For example, the expression e

(Let x 2
(Let y (+ x x)
( * y y)))

is equivalent to the OCaml expression

  let x = 2 in
  let y = x + x in
  y * y

So we expected eval e to be IntV 16. Note: When you write an evaluator, you will find that some branches of your match statements are not reachable. For example, when you evaluate an addition expression, you don't need to worry about the case that one of the operands evaluates to a Boolean value. Formally speaking, eval is undefined on these inputs, you can put whatever you want in these branches. Practically speaking, the "correct" thing to do according OCaml's spec is to use the expression assert false to indicate undefined branches.

val expr_of_value : value -> expr

One last thing. Our semantics depends on substitution which, as we've mentioned is a bit complicated. We'll be nice this time around and give you the implementation of substitution, but make sure you understand it, we'll be expecting you to implement it yourself later on (and we'll talk more about substitution in the coming weeks). One subtlety of substitution is that we define $[e_1 / x] e_2$ (i.e., substituting $e_1$ for $x$ in $e_2$ ) so that $e_1$ is an expression. But in the above rules, we substitute a value into an expression. Fortunately, values very easily reify to expressions.

Implement the function expr_of_value so that expr_of_value v is the expression representing e. You should use this function along with subst (given in the starter code) in your implementation of eval.

val interp : ?print:bool -> string -> value option

interp puts together parsing, type checking, and evaluation. This has been implemented for you.

Written (10%)

Partial Derivations

Not all judgments are derivable, but it's often possible to write partial derivations of underivable judgments. These partial derivations can give us insight into why we get certain type errors when type-checking.

Formally, a typing derivation is partial if its leaves are not necessarily empty. A partial derivation is maximal if every leaf is empty (i.e., there are no judgments to derive) or is an axiom that is not derivable according to the rules of the system. For example:

                    ───────────(int)
    ∅ ⊢ 1 : bool    ∅ ⊢ 2 : int    ∅ ⊢ false : int
    ──────────────────────────────────────────────(if)
          ∅ ⊢ if 1 then 2 else false : int

The expression if 1 then 2 else false is not well-typed in the empty context, but it can be given a maximal partial derivation, in which the second premise is an axiom which is derivable, where as the first and third premises are axioms which cannot be derived

Give maximal partial derivations of the following typing judgments.

\varnothing \vdash \texttt{if true then 1 else false} : \texttt{int}

\varnothing \vdash \texttt{if true then false else 1} : \texttt{bool}

\varnothing \vdash \texttt{if true then false else x || 2} : \texttt{bool}

\varnothing \vdash \texttt{if true then false else 2 || x} : \texttt{bool}

In addition, write down the type error that OCaml gives you when trying to type-check each expression (and think about what the type-error tells you about the order in which OCaml type-checks sub-expressions).