Final Review

Table of Contents

This page contains an outline of the topics covered in CAS CS 320: Principles of Programming Languages during the Spring 2023 semester at Boston University. It is not exhaustive, though it is biased towards those topics which are pertinent for the final exam of the course. Included are exercises associated with each topic.

1. OCaml

During the first half of the course, we learned the function programming language OCaml. Our primary goal was to learn how to program in the functional style, not thinking of a function as defining a list of commands, but instead as specifying the shape of its output.

1.1. Basics

OCaml has many types that are standard among programming languages, e.g., integers, Boolean values, floating-point numbers, tuples and lists, all with basic operators. It also has standard functional language constructs like let-bindings, anonymous functions, and if-then-else expressions.

1.1.1. Exercises

  1. Euclid's algorithm for determining the greatest common divisor of two integers takes advantage of the fact that \(\mathsf{gcd}(m, n) = \mathsf{gcd}(n, m \bmod n)\). Implement the function

    let gcd (m : int) (n : int) : int =
      assert false (* TODO *)
    

    which, given two integers m and n, returns their greatest common divisor.

  2. We can represent a rational number as a pair of integers where the first integer represents the numerator and the second represents the denominator. We maintain the invariant that the second number is positive and the pair of numbers are relatively prime. Implement the function add which adds two rational numbers, making sure to maintain this invariant.

    type rat = int * int
    
    let add (m : rat) (n : rat) : rat =
      assert false (* TODO *)
    
  3. Implement the function

    let gen_fib : (l : int list) (n : int) : int =
      if n < 0
      then assert false
      else
        assert false (* TODO *)
    

    which, given a list of integers l of length \(k\) and a nonnegative integer n, returns the \(n^\text{th}\) element of the following sequence:

    \begin{equation*} F_n = \begin{cases} l[n] & n < k \\ \sum_{i = 1}^k F_{n - i} & n \geq k \end{cases} \end{equation*}

1.2. Algebraic Data Types

One of the most important features of modern functional programming languages is pattern matching and algebraic data types. An algebraic data type (ADT) is defined by giving a collection of constructors, which themselves can carry other data. We can think of algebraic data types as defining unions of existing types. Here, for example, is an ADT whose values represent either a Boolean value or an integer.

type bool_or_int
  = Bool of bool
  | Int of int

We work with values of an ADT by pattern matching, providing different output values depending on the shape of the value we are matching with.

let num_val (x : bool_or_int) : int =
  match x with
  | Bool b -> if b then 1 else 0
  | Int i -> i

ADTs can be recursive (the type being defined can be referred to in its own definition) and parametric (the type being defined can depend parametrically on another type). These two features are encompassed in the list type.

type 'a mylist
  = Nil
  | Cons of 'a * 'a mylist

'a option and 'a result two other parametric types which are important to be familiar with.

1.2.1. Exercises

  1. Implement the function

    let matrix_of_list
      (l : 'a list)
      (num_cols : int) : ('a list list) option =
      assert false (* TODO *)
    

    which converts a list l into a matrix with num_cols columns, returing None in the case that num_cols is not positive or the resulting matrix is not rectangular (i.e., the length of l is not a multiple of num_cols).

  2. Implement the function

    let drop_nones (l : 'a option list) : 'a list =
      assert false (* TODO *)
    

    which returns an 'a list consisting of those elements of l which are not None (in order).

  3. A Red-Black tree is an ordered binary tree in which every node is labeled either red or black. It is further required that

    • no red node has a red child
    • every path from the root node to the leaf has the same number of black nodes

    Fill in the function is_valid below which returns true if t is a valid red-black tree, and false otherwise.

    type color
      = Red
      | Black
    
    type 'a rbtree
      = Leaf
      | Node of (color * 'a * 'a rbtree * 'a rbtree)
    
    let is_valid (t : 'a rbtree) : bool =
      assert false (* TODO *)
    

1.3. Record Types

Record types are essentially tuples with named fields. Here is an example of a representation of rational numbers using record types.

type rat =
  { numer : int
  ; denom : int
  ; is_positive : bool
  }

let two_thirds =
  { numer = 2
  ; denom = 3
  ; is_positive = true
  }

If we think of algebraic data types as unions of existing types, then we may think of record types as products of existing types. Beyond this, they have a couple conveniences that are good to remember.

  • Accessing fields in a record can be done via dot notation, i.e., if r is a rat then r.is_positive is a bool.
  • Updating a fields records can be done using with-notation. This is useful if a record has many fields but only a few need to be updated.

    let negate (r : rat) : rat =
      { r with is_positive = not r.is_positive }
    
    let recip (r : rat) : rat =
      if r.numer = 0
      then assert false
      else { r with numer = r.denom; denom = r.numer }
    

1.3.1. Exercises

  1. Fill in the record types below so that the given function type-checks.

    type rectangle1 = (* TODO *)
    
    type rectangle2 = (* TODO *)
    
    let transform (r : rectangle1) : rectangle2 =
      let (x, y) = r.center in
      { bottom_left = x -. r.width /. 2., y -. r.height /. 2.
      ; top_right = x +. r.width /. 2., y +. r.height /.  2.
      }
    
  2. Write a function which adds a binding to the list of captured bindings of a closure, ensuring that it is shadowed by any bindings already in the collection of captured bindings.

    type value = unit (* DUMMY TYPE *)
    type program = unit (* DUMMY TYPE *)
    type closure =
      { name : string
      ; body : program
      ; captured : (string * value) list
      }
    
    let add_binding (c : closure) (x : string) (v : value) =
      assert false (* TODO *)
    
  3. Suppose you are given a list of tools for converting sound files from one format to another. Each converter has a name, a list of input formats which it can convert from, and a list of output formats which it can convert to. Implement the function convert_options which, given a list of converters cs and an input format f, collects the possible output formats, keeping track of the names of the converter tools which can be used.

    type converter =
      { name: string
      ; input_formats : string list
      ; output_formats : string list
      }
    
    type convert_out =
      { converters : string list
      ; output_format : string
      }
    
    let convert_options
      (cs : converter list)
      (f : string) : convert_out list =
      assert false (* TODO *)
    

1.4. Higher-Order Programming

Higher-order programming is the use of functions as first-class values to write general, reusable code. There are three patterns in particular for higher-order programming with lists which we looked at in depth.

  • The function map, defined as

    let rec map (f : 'a -> 'b) (l : 'a list) : 'b list =
      match l with
      | [] -> []
      | x :: l -> let x = f x in x :: map f l
    

    replaces each element in l with f applied to that element, in order from left to right.

  • The function filter, defined as

    let rec filter (p : 'a -> bool) (l : 'a list) : 'a list =
      match l with
      | [] -> []
      | x :: l -> if p x then x :: filter p l else filter p l
    

    collects all element of l which satisfy the predicate p, in order from left to right.

  • The function fold_right, defined as

    let rec fold_right (f : 'a -> 'b -> 'b) (l : 'a list) (accu : 'b) : 'b =
      match l with
      | [] -> accu
      | x :: l -> f x (fold_right f l accu)
    

    applies the binary operation f between every element of l @ [accu] right-associatively:

     [x₁;   x₂;   x₃; ...   xₙ]
      ↓↓    ↓↓    ↓↓        ↓↓
    f x₁ (f x₂ (f x₃ (...(f xₙ accu)...)))
    

    and the function fold_left:

    let rec fold_left (f : 'b -> 'a -> 'b) (accu : 'b) (l : 'a list) : 'b =
      match l with
      | [] -> accu
      | x :: l -> fold_left f (f accu x) l
    

    does the same but to accu :: l left-associatively:

                       [x₁; x₂; x₃; ... xₙ]
                        ↓↓  ↓↓  ↓↓      ↓↓
    f (...(f (f (f accu x₁) x₂) x₃)...) xₙ
    

    Note that fold_left is tail-recursive whereas fold_right is not.

1.4.1. Exercises

  1. Implement the functions

    let andp (p1 : 'a -> bool) (p2 : 'a -> bool) : 'a -> bool =
      assert false (* TODO *)
    
    let orp (p1 : 'a -> bool) (p2 : 'a -> bool) : 'a -> bool =
      assert false (* TODO *)
    

    with the following properties:

    • given two predicates p1 and p2, the predicate andp p1 p2 is the predicate which expresses that both p1 and p2 hold.
    • given two predicates p1 and p2, the predicate orp p1 p2 is the predicate which expresses that p1 or p2 hold.
  2. We can represent a polynomial as a list of float's as follows.

          [a₀;  a₁;    a₂;     ...;  aₙ]
           ↓↓   ↓↓     ↓↓            ↓↓
    p(x) = a₀ + a₁ x + a₂ x² + ... + aₙ xⁿ
    

    Implement the function

    let derivative (p : float list) : float list =
      assert false (* todo *)
    

    which computes the list representing the polynomial \(p'(x)\), the derivative of \(p(x)\).

  3. When implementing radix sort on integers, it is necessary to partition a list of integers based on their last digits. Fill in the following function

    let bucket (l : int list) : int list list =
      let op accu next =
        assert false (* TODO *)
      in
      let base =
        List.init 10 (fun _ -> [])
      in
      List.fold_left op base l
    

    which, given a list l of integers, return 10 lists of integers which partition l by the last digit the members of l. That is, the \(i^\text{th}\) element of bucket l should contain exactly the elements of l whose last digit is \(i\).

1.5. Tail Recursion

Roughly speaking, a recursive call in the body of a function definition is in tail position if no evaluation is required after the recursive call. The following implementation of the factorial function is not tail recursive because it requires evaluating the product of the result of its recursive call with the input n.

let rec factorial (n : int) : int =
  if n < 0
  then assert false
  else if n = 0
  then 1
  else n * factorial (n - 1)

We can make functions tail recursive by adding an accumulator argument to the function.1

let factorial_tail (n : int) : int =
  let rec go (n : int) (accu : int) =
    if n = 0
    then accu
    else go (n - 1) (n * accu)
  in
  if n < 0
  then assert false
  else go n 1

1.5.1. Exercises

  1. Consider the following function which prints a (half) hourglass in asterisks. Implement a function which does the same thing but is tail recursive.

    let rec hourglass (n : int) : unit =
      if n > 0
      then
        let _ = print_endline (String.make n '*') in
        let _ = hourglass (n - 1) in
        let _ = print_endline (String.make n '*') in
        ()
      else ()
    
  2. (Challenge) Implement a tail recursive evaluator for Boolean expressions as represented by the following ADT.

    type bool_expr
      = Bool of bool
      | Not of bool_expr
      | And of bool_expr * bool_expr
      | Or of bool_expr * bool_expr
    
    let eval_tr (e : bool_expr) : bool =
      assert false (* TODO *)
    

1.6. Type Checking

OCaml is strongly typed, and programs are statically checked for adherence to typing rules. We did not look at the type rules in full detail, but we did see some examples of formal typing rules in OCaml, e.g.,

\begin{equation*} \frac {b \in \mathsf{bool} \qquad e_1 \in \mathsf{t} \qquad e_2 \in \mathsf{t}} {(\textsf{if } b \textsf{ then } e_1 \textsf{ else } e_2) \in \mathsf{t}} \end{equation*}

This rule expresses that an if-then-else expression has the same type as its if-case and else-case if its individual sub-expressions are well-typed.2

OCaml also has type inference, which means we often do not have to specify the types of expressions in OCaml programs (though it can be useful for documentation purposes).

1.6.1. Exercises

  1. Does this program type-check? If so, what are the types of bar and baz?

    type 'a foo = Foo of ('a foo -> 'a)
    let bar (Foo f) = f
    let baz x = bar x x
    
  2. Does this program type-check? If so, what is the type of foo?

    let rec foo x y =
      if x > 0 then
        foo (x - 1) (y +. 1.)
      else if x < 0 then
        foo y x
      else
        0
    

2. Formal Grammar

Grammar is the study of the form and structure of language. We use concepts from the formal grammar to represent and reason about the syntax of programming languages. These concepts also inform the design of parsers.

2.1. BNF Specifications

We start with a collection of symbols, separated into two disjoint groups, the nonterminal symbols and the terminal symbols. In a Backus-Naur form (BNF) specification we use angle brackets (e.g., <nonterm>) to denote a nonterminal symbol. We typically don't specify the symbols in advance, but instead glean them from the specification itself.

A sentential form is a sequences of symbols and a sentence is a sequence of terminal symbols.

A production rule is made up of a nonterminal symbol and a sentential form, and is written

<nonterm> ::= SENTFORM

We interpret a production rule as indicating that <nonterm> stands for SENTFORM in a sentential from.

A BNF specification is given by a collection of production rules and a starting symbol. We typically take the nonterminal symbol in the first rule of the specification to be the starting symbol. We also call a BNF specification a grammar.3

In the case that a BNF specification has multiples rules for the same nonterminal symbol, we use the notation

<nonterm> ::= SENTFORM₁ | SENTFORM₂ | ... | SENTFORMₖ

as shorthand for

<nonterm> ::= SENTFORM₁
<nonterm> ::= SENTFORM₂
 ...
<nonterm> ::= SENTFORMₖ

A derivation of a sentential form \(S\) in a grammar \(\mathcal G\) with start symbol <start> is a sequence of sentential forms, starting with the start symbol <start> and ending in \(S\), in which each form in the sequence (except for <start>) is the results of replacing one nonterminal symbol in the previous form with a sentential from given by a production rule of \(\mathcal G\).

A derivation is leftmost if the nonterminal symbol replaced at each step of the derivation is the leftmost nonterminal symbol in the sentential form.

A grammar \(\mathcal G\) accepts or recognizes a sentence \(S\) if has a derivation in \(\mathcal G\).

A parse tree, informally, is a derivation represented as a tree, in which

  • every node is labeled with a symbol
  • the root is labeled with the starting symbol
  • the children of each node are the symbols of the sentential form (in order) which replace the symbol labeling the node in the derivation.

The frontier of a parse tree (i.e., the leaf nodes in order from left to right) form the sentence derived in the derivation represented by the parse tree.

A parse tree may correspond to multiple derivations, but every derivation has a unique parse tree representation. Likewise, every parse tree corresponds to a unique leftmost derivation.

2.1.1. Examples

Toy grammar for English sentences:

<sentence>    ::= <noun-phrase> <verb-phrase>
<verb-phrase> ::= <verb> | <verb> <prep-phrase>
<prep-phrase> ::= <prep> <noun-phrase>
<noun-phrase> ::= <article> <noun>
<article>     ::= the
<noun>        ::= cow | moon
<verb>        ::= jumped
<prep>        ::= over

Derivation of a sentence recognized by the above grammar:4

<sentence>!
<noun-phrase>!     <verb-phrase>
<noun-phrase>      <verb>  <prep-phrase>!
<noun-phrase>!     <verb>  <prep>  <noun-phrase>
<article>  <noun>  <verb>  <prep>  <noun-phrase>!
<article>! <noun>  <verb>  <prep>  <article>  <noun>
the        <noun>! <verb>  <prep>  <article>  <noun>
the        cow     <verb>! <prep>  <article>  <noun>
the        cow     jumped  <prep>! <article>  <noun>
the        cow     jumped  over    <article>! <noun>
the        cow     jumped  over    the        <noun>!
the        cow     jumped  over    the        moon

Toy grammar for an imperative programming language:

<program> ::= <stmts>
<stmts>   ::= <stmt> | <stmt> ; <stmts>
<stmt>    ::= <var> = <expr>
<var>     ::= a | b | c | d
<expr>    ::= <term> | <term> + <term> | <term> - <term>
<term>    ::= <var> | const

A (leftmost) derivation of a program recognized by the above grammar:

<program>
<stmts>
<stmt> ; <stmts>
<var> = <expr> ; <stmts>
a = <expr> ; <stmts>
a = <term> ; <stmts>
a = const ; <stmts>
a = const ; <stmt> ; <stmts>
a = const ; <var> = <expr> ; <stmts>
a = const ; a = <expr> ; <stmts>
a = const ; a = <term> + <term> ; <stmts>
a = const ; a = <var> + <term> ; <stmts>
a = const ; a = a + <term> ; <stmts>
a = const ; a = a + const ; <stmts>
a = const ; a = a + const ; <var> = <expr>
a = const ; a = a + const ; b = <expr>
a = const ; a = a + const ; b = <term>
a = const ; a = a + const ; b = <var>
a = const ; a = a + const ; b = a

2.1.2. Extended BNF

We extend the notation of BNF specifications to make it more convenient to use.5

  • [ SENTFORM₁ | SENTFORM₂ | ... | SENTFORMₖ ] refers to an optional collection of alternatives of a sentential form. For example, we can represent an integer by the following specification:

    <int>    ::= [ - ] <digits>
    <digits> ::= <digit> | <digit> <digits>
    <digit>  ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
    
  • ( SENTFORM₁ | SENTFORM₂ | ... | SENTFORMₖ ) refers to a collection of alternatives within a sentential form. For example, we can represent arithmetic expressions by the following specification:

    <expr>   ::= <expr> ( + | - | * | / ) <expr> | <digits>
    <digits> ::= <digit> | <digit> <digits>
    <digit>  ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
    
  • { SENTFORM₁ | SENTFORM₂ | ... | SENTFORMₖ } refers to zero or more occurrences of the sentential forms in a collection of alternatives. For example, we can simplify the specification for integers (and enforce that the first digit must be nonzero):

    <int> ::= [ - ] ( 1 | ... | 9) { 0 | ... | 9 }
    

2.1.3. Exercises

  1. List the symbols (both terminal and nonterminal) implicit in the following specification.

    <a> ::= a <b> | a <a> b
    <b> ::= c <a> | d
    
  2. Give a leftmost derivation of a a a c a d b b in the above grammar. Draw its associated parse tree.
  3. Verify that a = a + a ; b = b is recognized by the grammar for the simple imperative language above. Give a derivation that is not leftmost.
  4. Implement the function

    type 'a tree
      = Leaf of 'a
      | Node of 'a tree list
    
    let frontier (t : 'a tree) : 'a list =
      assert false (* TODO *)
    

    which returns a list of the members of t in order from left to right.

2.2. Ambiguity

A BNF specification is ambiguous if there is a sentence with multiple parse trees. Equivalently, a specification is ambiguous if there is a sentence with multiple leftmost derivations. We try to avoid ambiguous specifications for programming languages because we ultimately don't want a program to be interpretable in multiple ways.

2.2.1. Fixity

The fixity of an operator refers to where the operator is written with respect to its arguments.

  • prefix operators appear before their argument
    • the negation operator: -5
  • postfix operators appear after their argument
    • type constructors: int list
  • infix (binary) operators appear between their arguments
    • arithmetic operators: (1 + 2) * (3 + 4))
  • mixfix operators are a combination of these
    • if-else-expressions: if not b the f x else g x

If a language's syntactic constructs are all prefix (Polish notation) or all postfix (reverse Polish notation) then the specification is unambiguous. We can make infix binary operators unambiguous by specifying their associativity and precedence.

2.2.2. Associativity

An operator \(\square\) is declared left associative if we interpret \(a \square b \square c\) as \((a \square b) \square c\).

  • For arithmetic expressions, we take subtraction to be left-associative, so the expression 1 - 2 - 3 evaluates to -4 as opposed to 2.

An operator \(\square\) is declared right associative if we interpret \(a \square b \square c\) as \(a \square (b \square c)\).

  • For arithmetic expressions, we take exponentiation to be right-associative, so the expression 2 ^ 1 ^ 3 evaluates to 2 as opposed to 8.

The associativity of an operator should affect the shape of parse tree in a given grammar. For example, in the grammar:

<expr> ::= x | <expr> + <expr>

there are two parse trees for the sentence x + x + x:

<expr>                                  <expr>
|                                       |
|---------------------------\           |------------ \
|                    |      |           |      |      |
<expr>               |      |           |      |      <expr>
|                    |      |           |      |      |
|-------------\      |      |           |      |      |-------------\
|      |      |      |      |           |      |      |      |      |
<expr> |      <expr> |      <expr>      <expr> |      <expr> |      <expr>
|      |      |      |      |           |      |      |      |      |
x      +      x      +      x           x      +      x      +      x

The left tree groups the first two arguments (left associatively) and the second groups the last two arguments (right associatively).

We can enforce the associativity of an operator in the specification itself by breaking symmetry in the production rules, effectively choosing one of the above two parse trees. Addition is typically understood as left associative, so we should require its right argument to be a variable:

<expr> ::= x | <expr> + x

2.2.3. Precedence

Given two binary operators \(\square\) and \(\triangle\), the operator \(\square\) has higher precedence than \(\triangle\) if we interpret \(a \square b \triangle c\) as \((a \square b) \triangle c\) and \(a \triangle b \square c\) as \(a \triangle (b \square c)\).

  • For arithmetic expressions, we take multiplication to have higher precedence than addition, so the expression 2 * 2 + 3 evaluates to 7 as opposed to 10.

As with associativity, we can enforce precedence within the specification itself. In the grammar:

<expr> ::= x | <expr> + x | <expr> * x

the sentences x + x * x has only one parse tree, but is not the correct parse tree with respect to the rules of arithmetic (addition has been given higher precedence).

<expr>
|
|---------------------------\
|                    |      |
<expr>               |      |
|                    |      |
|-------------\      |      |
|      |      |      |      |
<expr> |      |      |      |
|      |      |      |      |
x      +      x      *      x

To fix this, we can break symmetry again by ensuring that arguments to multiplication can only be variables or other expressions with only multiplication.

<expr>  ::= <expr2> | <expr> + <expr2>
<expr2> ::= x | <expr2> * x

In this grammar, the parse tree of x + x * x groups the arguments of multiplication (giving multiplication higher precedence).

<expr>
|
|------------- \
|       |      |
<expr>  |      <expr2>
|       |      |
|       |      |--------------\
|       |      |       |      |
<expr2> |      <expr2> |      |
|       |      |       |      |
x       +      x       *      x

Note that it is now impossible to group the argument of the addition. In order to make this possible, parentheses need to be introduced into the grammar:

<expr>  ::= <expr2> | <expr> + <expr2>
<expr2> ::= x | <expr2> * x | ( <expr> )

2.2.4. Exercises

  1. Draw the parse tree for the sentence (x + x) * x in the grammar at the end of the section.
  2. Draw the parse tree for the sentence x * x + x * x in the grammar at the end of the section.
  3. Update the grammar at the end of the section to include subtraction, multiplication, and (right-associative) exponentiation.
  4. Consider the following ambiguous grammar.

    <s>  ::= <np> <vp>
    <vp> ::= <v> | <v> <np> | <v> <np> <pp>
    <pp> ::= <p> <np>
    <np> ::= <n> | <d> <n> | <np> <pp>
    <n>  ::= John | man | mountain | telescope
    <v>  ::= saw
    <d>  ::= the
    <p>  ::= on | with
    

    Give two leftmost derivations of the sentence John saw the man on the mountain with the telescope.

  5. Consider the following grammar for function types over int.

    <fun-type> ::= <int-type> | <fun-type> -> <fun-type>
    <int-type> ::= int
    

    Show that this grammar is ambiguous and then rewrite this grammar so that -> is a right associative operator.

2.3. Regular Grammars

A (right linear) regular grammar is a grammar such that every rule is in one of the following forms:

<nonterm> ::= term
<nonterm> ::= term <nonterm>
<nonterm> ::= ϵ

where \(\epsilon\) refers to the empty sentence.

Regular expressions are a compact syntax for regular grammars.6 They are defined inductively as follows:

  • a terminal symbol is a regular expression
  • \(\epsilon\) is a regular expression
  • if \(r_1, r_2, \dots, r_k\) are regular expressions then so are \(r_1r_2 \dots r_k\) and \(r_1 \ | \ r_2 \ | \ \dots \ | \ r_k\)
  • if \(r\) is a regular expression, then so is \((r)\) and \(r^*\)

A regular expression accepts or recognizes a sentence according to the following rules:

  • a terminal symbol \(t\) accepts \(t\)
  • if each \(r_i\) excepts \(s_i\), then \(r_1r_2\dots r_k\) accepts \(s_1s_2\dots s_k\)
  • if \(r_i\) accepts \(s_i\) then, \(r_1 \ | \ r_2 \ | \ \dots \ | \ r_k\) accepts \(s_i\)
  • if \(r\) accepts \(s\) the \((r)\) accepts \(s\)
  • if \(r\) accepts \(s_1, s_2, \dots, s_k\), then \(r^*\) accepts \(s_1s_2 \dots s_k\)

2.3.1. Exercises

  1. Find a regular expression for all binary strings in which every 1 is adjacent to exactly one other 1.
  2. Write a right-linear regular grammar for all binary strings in which every 1 must be followed by at least two 0's. Give a derivation in this grammar of the sentence 100100010000.
  3. Determine which of the following sentences are accepted by the regular expression a(bc|cb)*d.
    • abc
    • abcd
    • abccbd
    • abad
    • abbcd

2.4. Chomsky Normal Form

A grammar is in Chomsky normal form if its rule are all of the following forms:

<start>   ::= ϵ
<nonterm> ::= <nonterm> <nonterm>
<nonterm> ::= term

Every grammar we have considered (i.e., every grammar with a BNF specification) can be converted into an equivalent grammar (i.e., one which accepts the same sentences) which is in Chomsky normal form.

2.4.1. Exercises

  1. Rewrite the following grammar in Chomsky normal form.

    <a> ::= a <b> b
    <b> ::= <b> c | <c> <c>
    <c> ::= <c> d | d <c>
    

3. Parsing

The general parsing problem is to find a derivation of a sentence in a given grammar, if one exists.7

In the context of this course, we are primarily interested in the specific problem of converting a string (or char list) into an ADT representing the syntax of the program.

There are many ways to accomplish this, we saw two: recursive-descent and parser-combinators.

Recursive-descent parsing is an ad-hoc parsing method in which mutually recursive functions are defined to parse each part of a given specification. It will not appear on the final exam in any significant way.

3.1. Combinators

We can think of a parser for 'a's as a function of type

type 'a parser = char list -> ('a * char list) option

which

  1. consumes the prefix of the input stream corresponding to an 'a,
  2. converts that prefix to an 'a, and finally,
  3. returns that 'a and the remainder of the stream, failing if no initial part of the stream corresponds to an 'a.

One of the simplest examples is the char parser:

let char (d : char) (cs : char list) : (char * char list) option =
  match cs with
  | c :: cs when c = d -> Some (d, cs)
  | _ -> None

which consumes the first character of cs given that it is equal to d and returns it, along with the remainder of cs. This parser fails (returns None) in the case that the first character of cs is not d.

When we want to use a parser, we apply it to a character list and verify that it consumed it's the entire input:

let parse (p : 'a parser) (s : string) =
  match p (explode s) with
  | (a, []) -> Some a
  | _ -> None

A parser combinator is a higher-order function which can be used to compose parsers. There is a small subset of parser combinators which are of particular importance because they correspond to the constructs in EBNF specifications and regular expressions.

  • Alternatives. p1 <|> p2 is the parser which tries running the parser p1, returning its output if it succeeds, and runs p2 otherwise.

    If p1 is a parser for the forms of a nonterminal symbol <p1> and p2 a parser for forms of a nonterminal symbol <p2>, then p1 <|> p2 is a parser for forms of the nonterminal symbol

    <alt> ::= <p1> | <p2>
    
  • Sequencing. seq p1 p2 is the parser which runs both p1 and p2 and returns both of their outputs if both parsers succeed. It fails if either p1 or p2 fails.

    If p1 is a parser for the forms of a nonterminal symbol <p1> and p2 a parser for forms of a nonterminal symbol <p2>, then seq p1 p2 is a parser for forms of the nonterminal symbol

    <seq> ::= <p1> <p2>
    
  • Repetition. many p is the parser which runs p repeatedly until it fails, collecting all its outputs in a list.

    If p is a parser for the forms of a nonterminal symbol <p>, then many p is a parser for forms of the nonterminal symbol

    <many> ::= { <p> }
    

The last important combinator is map, which can be used to manipulate the output of a parser without affecting how it consumes its input. If p is an 'a parser, and f is a function of type 'a -> 'b, then map f p is a 'b parser which runs 'a and then applies f to its output (if it succeeds).

You should also be familiar with how to use the more convenient parser combinators throughout the course. For the final exam, you will not be required to memorize their definitions.

  • str, token, ws
  • (>>), (<<), seq3, seq4
  • (>|), map2, map3, map4
  • optional
  • pure, fail, (we will not test on bind (>>=) but it is good to know…)

3.1.1. Exercises

  1. Implement the parser

    let parse_bool : bool parser =
      assert false (* TODO *)
    

    which can consume "True" and return the value tre or "False" and return the value false. In particular, it should not consume whitespace before or "True" or "False"

  2. Implement the parser

    let parse_bool_list : bool list parser =
      assert false (* TODO *)
    

    which can parse a list of booleans in Python syntax, i.e., square brackets, comma separators, white-space agnostic.

  3. Implement a parser

    let peak : char parser =
      assert false (* TODO *)
    

    which returns the first character of its input but does not consume any part of the input. This should only fail if the input is empty.

  4. Implement a parser combinator

    let check (p : 'a parser) (pred : 'a -> bool) : 'a parser =
      assert false (* TODO *)
    

    which runs p and then fails if pred is false on the output of p. Hint. It is easier to implement directly, rather than in terms of other combinators.

4. Formal Semantics

Semantics is the study of the meaning of language. We use concepts from formal semantics to model the meaning of programs.

We discussed two forms of semantics: denotational and operational.

Giving a denotational semantics for a programming language means assigning to each program a mathematical function which has the same input/output behavior as the program.

We focused primarily on operational semantics. Giving an operational semantics for a programming language means describing how a program is evaluated. This typically means defining a reduction relation on programs, which describes how a program is reduced until it reaches a state which cannot be further reduced.

4.1. Derivations

Suppose we have a programming language defined by a BNF specification \(\mathcal P\).

A configuration is a pair consistent of a program \(P\) (a sentence accepted by \(\mathcal P\)) and a state \(S\) which may be manipulated by programs. The nature of \(S\) depends on the intentions of the programming language. The state may be empty, as in the case of functional languages.

Defining a (small-step) operational semantics means defining a reduction relation for configurations:

\begin{equation*} ( \ S \ , \ P \ ) \longrightarrow ( \ S' \ , \ P' \ ) \end{equation*}

A reduction relation is defined via reduction rules, which consist of a shape of a reduction together with a collection of premises, which may be other shapes of reductions or trivial premises (also called axioms). The general form of a reduction rules is:

\begin{prooftree} \AxiomC{$P_1$} \AxiomC{$P_2$} \AxiomC{$\dots$} \AxiomC{$P_k$} \RightLabel{(name)} \QuaternaryInfC{$C \longrightarrow C'$} \end{prooftree}

Shape here refers to the fact that the configurations in a reduction rule contains meta-variables that describe the kind of reductions that can be derived, or that can be used a premises. For example, the rules for evaluating an arithmetic expression might include

\begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(addLeft)} \UnaryInfC{$e_1 + e_2 \longrightarrow e_1' + e_2$} \end{prooftree}

which expresses that if \(e_1\) reduces to \(e_1'\) in a single step then \(e_1 + e_2\) reduces to \(e_1' + e_2\) in a single step no matter the expressions \(e_1\), \(e_1'\) and \(e_2\). This rule can be used to show that

\begin{equation*} (1 + 2) + 3 \longrightarrow 3 + 3 \end{equation*}

but also that

\begin{equation*} (1 + (4 - 2)) + 3 \longrightarrow (1 + 2) + 3 \end{equation*}

Both reductions invoke that fact that if the left argument can be reduced by a single step, then the sum can be reduced by a single step.

A derivation is, informally, a tree in which each node is a reduction and the children of a node are the premises required to derive that reduction according to a reduction rule. The leaves of a derivation are trivial premises. A derivation of a reduction \(C \longrightarrow C'\) is a derivation whose root is \(C \longrightarrow C'\).

What we have been describing is a single-step reduction relation. Any single-step reduction relation can be extended to a multi-step reduction relation by including the following rules and a multi-step reduction relation symbol '\(\longrightarrow^{\star}\)'.

\begin{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(refl)} \UnaryInfC{$C \longrightarrow^\star C$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$C \longrightarrow^\star C'$} \AxiomC{$C' \longrightarrow C''$} \RightLabel{(trans)} \BinaryInfC{$C \longrightarrow^\star C''$} \end{prooftree} \end{equation*}

Formally, evaluation is the process of determining a configuration \(C'\) for a given configuration \(C\) such that \(C \longrightarrow^\star C'\) and \(C'\) cannot be further reduced (i.e., there is no other configuration \(C''\) such that \(C' \longrightarrow C''\)).

We can express an evaluation as a derivation of a multi-step reduction, but it is common (and done below) to simply write a sequence of configurations, with the understanding that each configuration follows from the previous configuration by a derivable single-step reduction.

4.2. Examples

There are many examples of operational semantics and reduction rules in this document, we start with the two most basic.

4.2.1. Arithmetic Expressions

In the operational semantics of arithmetic expressions, we take a configuration to be an arithmetic expression (a program with no state). Each rule describes how to reduce an arithmetic expression by a single step.


Grammar for a subset of arithmetic expressions in Polish notation:

<expr>  ::= <num> | (add | sub) <expr> <expr>
<num>   ::= [-] (0 | ... | 9) | { 0 | ... | 9 }

Operational semantics for a subset of arithmetic expression in Polish notation:

\begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(addNum)} \BinaryInfC{$\mathsf{add} \ m \ n \longrightarrow m + n$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(addLeft)} \UnaryInfC{$\textsf{add} \ e_1 \ e_2 \longrightarrow \mathsf{add} \ e_1' \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(addRight)} \UnaryInfC{$\mathsf{add} \ e_1 \ e_2 \longrightarrow \mathsf{add} \ e_1 \ e_2'$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(subNum)} \BinaryInfC{$\mathsf{sub} \ m \ n \longrightarrow m - n$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(subLeft)} \UnaryInfC{$\textsf{sub} \ e_1 \ e_2 \longrightarrow \mathsf{sub} \ e_1' \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(subRight)} \UnaryInfC{$\mathsf{sub} \ e_1 \ e_2 \longrightarrow \mathsf{sub} \ e_1 \ e_2'$} \end{prooftree} \end{equation*}

It is generally preferable that any derivable reduction has a unique derivation. This makes defining an evaluation procedure easier, and amounts to fixing an evaluation order. We can enforce an evaluation order via the structure of our reduction rules.


Operational semantics for a subset of arithmetic expression in Polish Notation with a left-to-right evaluation order:

\begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(addNum)} \BinaryInfC{$\mathsf{add} \ m \ n \longrightarrow m + n$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(addLeft)} \UnaryInfC{$\textsf{add} \ e_1 \ e_2 \longrightarrow \mathsf{add} \ e_1' \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(addRight)} \BinaryInfC{$\mathsf{add} \ m \ e_2 \longrightarrow \mathsf{add} \ m \ e_2'$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(subNum)} \BinaryInfC{$\mathsf{sub} \ m \ n \longrightarrow m - n$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(subLeft)} \UnaryInfC{$\textsf{sub} \ e_1 \ e_2 \longrightarrow \mathsf{sub} \ e_1' \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(subRight)} \BinaryInfC{$\mathsf{sub} \ m \ e_2 \longrightarrow \mathsf{sub} \ m \ e_2'$} \end{prooftree} \end{equation*}

Example single-step derivation:

\begin{equation}\label{d1} \begin{prooftree} \AxiomC{$2 \in \mathbb Z$} \AxiomC{$3 \in \mathbb Z$} \BinaryInfC{$\mathsf{sub} \ 2 \ 3 \longrightarrow \text{-} 1$} \UnaryInfC{$\mathsf{add} \ \mathsf{sub} \ 2 \ 3 \ \mathsf{add} \ 1 \ 1 \longrightarrow \mathsf{add} \ \text{-}1 \ \mathsf{add} \ 1 \ 1$} \end{prooftree} \end{equation}

Example multi-step derivation:

\begin{equation}\label{d2} \begin{prooftree} \AxiomC{$\text - 1 \in \mathbb Z$} \AxiomC{$1 \in \mathbb Z$} \AxiomC{$1 \in \mathbb Z$} \BinaryInfC{$\mathsf{add} \ 1 \ 1 \longrightarrow 2$} \BinaryInfC{$\mathsf{add} \ \text - 1 \ \mathsf{add} \ 1 \ 1 \longrightarrow \mathsf{add} \ \text- 1 \ 2$} \end{prooftree} \end{equation} \begin{equation}\label{d3} \begin{prooftree} \AxiomC{$\text- 1 \in \mathbb Z$} \AxiomC{$2 \in \mathbb Z$} \BinaryInfC{$\mathsf{add} \ \text- 1 \ 2 \longrightarrow 1$} \end{prooftree} \end{equation} \begin{prooftree} \AxiomC{} \UnaryInfC{$\mathsf{add} \ \mathsf{sub} \ 2 \ 3 \ \mathsf{add} \ 1 \ 1 \longrightarrow^\star \mathsf{add} \ \mathsf{sub} \ 2 \ 3 \ \mathsf{add} \ 1 \ 1$} \AxiomC{(\ref{d1})} \BinaryInfC{$\mathsf{add} \ \mathsf{sub} \ 2 \ 3 \ \mathsf{add} \ 1 \ 1 \longrightarrow^\star \mathsf{add} \ \text{-}1 \ \mathsf{add} \ 1 \ 1$} \AxiomC{(\ref{d2})} \BinaryInfC{$\mathsf{add} \ \mathsf{sub} \ 2 \ 3 \ \mathsf{add} \ 1 \ 1 \longrightarrow^\star \mathsf{add} \ \text{-}1 \ 2$}\ \AxiomC{(\ref{d3})} \BinaryInfC{$\mathsf{add} \ \mathsf{sub} \ 2 \ 3 \ \mathsf{add} \ 1 \ 1 \longrightarrow^\star 1$}\ \end{prooftree}

Example evaluation, written as a sequence of configurations:

add sub 2 3 add 1 1 ⟶
add -1 add 1 1 ⟶
add -1 2 ⟶
1 ✓

4.2.2. A Stack-Oriented Language

The primary examples we used in this course for understanding operational semantics were variants of stack-oriented languages.8

In the simplest case, we take a configuration to be a program (\(P\)) together with a stack of integers (\(S\)), written

\begin{equation*} ( \ S \ , \ P \ ) \end{equation*}

We also include a special irreducible configuration \(\mathsf{ERROR}\) for representing a failed evaluation.


Grammar for a simple stack-oriented language:

<prog>  ::= { <com> }
<com>   ::= push <num> | dup | add | sub
          | then <prog> else <prog> end
<num>   ::= (0 | ... | 9) { 0 | ... | 9 }

Example program which puts 14 + 15 - 16 on the stack:

push 16 push 15 push 14
add sub

Operational semantics for a simple stack-oriented language:

\begin{prooftree} \AxiomC{} \RightLabel{(push)} \UnaryInfC{$(\ S \ , \ \textsf{push n} \ P \ ) \longrightarrow ( \ n :: S \ ,\ P \ )$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(dup)} \UnaryInfC{$( \ n :: S \ , \ \textsf{dup} \ P \ ) \longrightarrow ( \ n :: n :: S \ , \ P \ )$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{} \RightLabel{(dupErr)} \UnaryInfC{$( \ \varnothing \ , \ \textsf{dup} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(add)} \UnaryInfC{$( \ m :: n :: S \ , \ \textsf{add} \ P \ ) \longrightarrow ( \ (m + n) :: S \ , \ P \ )$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(addErr1)} \UnaryInfC{$( \ n :: S \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{} \RightLabel{(addErr0)} \UnaryInfC{$( \ \varnothing \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(sub)} \UnaryInfC{$( \ m :: n :: S \ , \ \textsf{sub} \ P \ ) \longrightarrow ( \ (m - n) :: S \ , \ P \ )$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(subErr1)} \UnaryInfC{$( \ n :: S \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{} \RightLabel{(subErr0)} \UnaryInfC{$( \ \varnothing \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(ifFalse)} \UnaryInfC{$( \ 0 :: S \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ Q_2 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$n \not = 0$} \RightLabel{(ifTrue)} \UnaryInfC{$( \ n :: S \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ Q_1 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifErr)} \UnaryInfC{$( \ \varnothing \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree}

Example multi-step derivation:

\begin{equation}\label{e1} \begin{prooftree} \AxiomC{} \UnaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow ( \ 16 :: \varnothing \ , \ \textsf{push 15 push 14 add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{e2} \begin{prooftree} \AxiomC{} \UnaryInfC{$ ( \ 16 :: \varnothing \ , \ \textsf{push 15 push 14 add sub} \ ) \longrightarrow ( \ 15 :: 16 :: \varnothing \ , \ \textsf{push 14 add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{e3} \begin{prooftree} \AxiomC{} \UnaryInfC{$ ( \ 15 :: 16 :: \varnothing \ , \ \textsf{push 14 add sub} \ ) \longrightarrow ( \ 14 :: 15 :: 16 :: \varnothing \ , \ \textsf{add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{e4} \begin{prooftree} \AxiomC{} \UnaryInfC{$ ( \ 14 :: 15 :: 16 :: \varnothing \ , \ \textsf{add sub} \ ) \longrightarrow ( \ 29 :: 16 :: \varnothing \ , \ \textsf{sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{e5} \begin{prooftree} \AxiomC{} \UnaryInfC{$ ( \ 29 :: 16 :: \varnothing \ , \ \textsf{sub} \ ) \longrightarrow ( \ 13 :: \varnothing \ , \ \epsilon \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{f1} \begin{prooftree} \AxiomC{} \UnaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow^\star ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{f2} \begin{prooftree} \AxiomC{(\ref{f1})} \AxiomC{(\ref{e1})} \BinaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow^\star ( \ 16 :: \varnothing \ , \ \textsf{push 15 push 14 add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{f3} \begin{prooftree} \AxiomC{(\ref{f2})} \AxiomC{(\ref{e2})} \BinaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow^\star ( \ 15 :: 16 :: \varnothing \ , \ \textsf{push 14 add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{f4} \begin{prooftree} \AxiomC{(\ref{f3})} \AxiomC{(\ref{e3})} \BinaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow^\star ( \ 14 :: 15 :: 16 :: \varnothing \ , \ \textsf{add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{f5} \begin{prooftree} \AxiomC{(\ref{f4})} \AxiomC{(\ref{e4})} \BinaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow^\star ( \ 29 :: 16 :: \varnothing \ , \ \textsf{sub} \ ) $} \end{prooftree} \end{equation} \begin{prooftree} \AxiomC{(\ref{f5})} \AxiomC{(\ref{e5})} \BinaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow^\star ( \ 13 :: \varnothing \ , \ \epsilon \ ) $} \end{prooftree}

Example evaluation:

( ∅                   , push 16 push 15 push 14 add sub ) ⟶
( 16 :: ∅             , push 15 push 14 add sub ) ⟶
( 15 :: 16 :: ∅       , push 14 add sub ) ⟶
( 14 :: 15 :: 16 :: ∅ , add sub ) ⟶
( 29 :: 16 :: ∅       , sub ) ⟶
( 13 :: ∅             , ϵ ) ✓

4.2.3. Exercises

  1. Add multiplication (mul) and division (div) to the operational semantics of arithmetic expressions (make sure to disallow division by \(0\)).
  2. Derive the single-step reduction

    sub add add 1 2 mul 3 4 2 ⟶ sub add 3 mul 3 4 2
    

    in the operational semantics for arithmetic expressions.

  3. Derive the multi-step reduction

    sub add add 1 2 mul 3 4 2 ⟶⋆ ??
    

    where ?? is the value of the given expression according to the given semantics. You may write it as a sequence of configurations.

  4. Write an operational semantics for Boolean expressions with left-to-right evaluation and short-circuiting (like in project 3).
  5. Given an evaluation of the program

    push 10 dup mul dup
    

    in the given semantics for the above stack-oriented language.

5. Variables

When introducing variables into a programming language, we need to introduce an environment to the configuration which maintains variable bindings.

The structure of the environment depends on the scoping rules used in the language, but in the simplest case, and environment is a mapping of variable names to values. This can be implemented as an association list in OCaml.

No matter the implementation, an environment requires two operations: fetch and update. The fetch operation should get the value to which a variable is bound in the environment, failing if the variable is not bound in the environment. The update operation should return a new environment with the additional binding.


OCaml implementation of an environment represented as an association list:

type value = unit (* A DUMMY TYPE *)
type 'a env = ('a * value) list

let empty_env = []

let fetch (e : 'a env) (x : 'a) : value option =
  List.assoc_opt x e

let update
  (e : 'a env)
  (x : 'a)
  (v : value) : 'a env =
  (x, v) :: e

Again, different environment representations need different implementations of fetch and update, but regardless of how these functions are implemented, they should satisfy the following equations:

  • fetch emtpy_env x = None for any x
  • fetch (update e x v) x = Some v for any e, x, and v
  • fetch (update e x v) y = fetch e y for any e, x, y, and v given y is not equal to x

5.1. Stack-Oriented Language with Variables

In the operational semantics of the following stack-oriented language, we take a configuration to be a program (\(P\)) together with a stack of integers (\(S\)) and an environment \(E\) of variable bindings.

\begin{equation*} ( \ S \ , \ E \ , \ P \ ) \end{equation*}

As above, we also include the special irreducible configuration \(\mathsf{ERROR}\).


Grammar for a stack-oriented language with variables:

<prog>  ::= { <com> }
<com>   ::= push <num> | dup | add | sub
          | then <prog> else <prog> end
          | lookup <ident> | assign <ident>
<num>   ::= ( 0 | ... | 9 ) { 0 | ... | 9 }
<ident> ::= ( A | ... | Z ) { A | ... | Z }

Example program:

push 14 assign X
push 15 assign Y
lookup X lookup Y add

Operational semantics for a stack-oriented language with variables:

\begin{prooftree} \AxiomC{} \RightLabel{(push)} \UnaryInfC{$(\ S \ , \ E \ , \ \textsf{push n} \ P \ ) \longrightarrow ( \ n :: S \ , \ E \ ,\ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dup)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow ( \ n :: n :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dupErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(add)} \UnaryInfC{$( \ m :: n :: S \ , \ E \ , \ \textsf{add} \ P \ ) \longrightarrow ( \ (m + n) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(addErr1)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{} \RightLabel{(addErr0)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(sub)} \UnaryInfC{$( \ m :: n :: S \ , \ E \ , \ \textsf{sub} \ P \ ) \longrightarrow ( \ (m - n) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(subErr1)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{} \RightLabel{(subErr0)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(ifFalse)} \UnaryInfC{$( \ 0 :: S \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q_2 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$n \not = 0$} \RightLabel{(ifTrue)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q_1 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x) \in \mathbb Z$} \RightLabel{(lookup)} \UnaryInfC{$( \ S \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow ( \ \mathsf{fetch}(E, x) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x) \not \in \mathbb Z$} \RightLabel{(lookupErr)} \UnaryInfC{$( \ S \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assign)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow ( \ S \ , \ \mathsf{update}(E, x, n) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assignErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree}

Example evaluation:

( ∅             , []               , push 14 assign X push 15 assign Y lookup X lookup Y add ) ⟶
( 14 :: ∅       , []               , assign X push 15 assign Y lookup X lookup Y add ) ⟶
( ∅             , [X ↦ 14]         , push 15 assign Y lookup X lookup Y add ) ⟶
( 15 :: ∅       , [X ↦ 14]         , assign Y lookup X lookup Y add ) ⟶
( ∅             , [Y ↦ 15; X ↦ 14] , lookup X lookup Y add ) ⟶
( 14 :: ∅       , [Y ↦ 15; X ↦ 14] , lookup Y add ) ⟶
( 15 :: 14 :: ∅ , [Y ↦ 15; X ↦ 14] , add ) ⟶
( 29 :: ∅       , [Y ↦ 15; X ↦ 14] , ϵ ) ✓

The other primary concern when it comes to variables is scoping. This will come in a later section after discussing subroutines.

5.1.1. Exercises

  1. There is currently no way to unassign variables in the environment. Redefine the function

    let update (e : env) (x : string) (v : value option) : env =
      assert false (* TODO *)
    

    which takes an optional value so that update e x (Some v) will bind x to v in e and update e x None will remove any bindings of x in the environment.

  2. Extend the command set and the operational semantics to include a command unassign X which can be used to remove bindings in the environment within a program.

6. Subroutines

A subroutine is an encapsulated piece of code that can be reused and executed in different contexts within a program.

Implementing subroutines in a programming language requires dealing with several concerns, including parameter passing and return values.

6.1. Stack-Oriented Language with Subroutines

For stack-oriented languages, we don't need to deal with parameters or return values of subroutines; all of these is handled on the stack. This makes adding subroutines to our stack-oriented languages fairly simple.

In the operational semantics of the following stack-oriented language, we take a configuration to be a program (\(P\)) together with a stack of integers (\(S\)) and an environment (\(E\)) of bindings which now includes bindings to programs, or the special \(\mathsf{ERROR}\) configuration.


Grammar for a stack-oriented language with subroutines:

<prog>  ::= { <com> }
<com>   ::= push <num> | dup | add | sub
          | then <prog> else <prog> end
          | lookup <ident> | assign <ident>
          | def <ident> begin <prog> end | call <ident>
<num>   ::= ( 0 | ... | 9 ) { 0 | ... | 9 }
<ident> ::= ( A | ... | Z ) { A | ... | Z }

Example program:

def F begin
  dup add
end

push 6 assign X
push 5 call F
lookup X call F

Operational Semantics for a stack-oriented language with subroutines (and dynamic scoping):

\begin{prooftree} \AxiomC{} \RightLabel{(push)} \UnaryInfC{$(\ S \ , \ E \ , \ \textsf{push n} \ P \ ) \longrightarrow ( \ n :: S \ , \ E \ ,\ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dup)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow ( \ n :: n :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dupErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(add)} \UnaryInfC{$( \ m :: n :: S \ , \ E \ , \ \textsf{add} \ P \ ) \longrightarrow ( \ (m + n) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(addErr1)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(addErr0)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(sub)} \UnaryInfC{$( \ m :: n :: S \ , \ E \ , \ \textsf{sub} \ P \ ) \longrightarrow ( \ (m - n) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(subErr1)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(subErr0)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifFalse)} \UnaryInfC{$( \ 0 :: S \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q_2 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$n \not = 0$} \RightLabel{(ifTrue)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q_1 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x) \in \mathbb Z$} \RightLabel{(lookup)} \UnaryInfC{$( \ S \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow ( \ \mathsf{fetch}(E, x) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x) \not \in \mathbb Z$} \RightLabel{(lookupErr)} \UnaryInfC{$( \ S \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assign)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow ( \ S \ , \ \mathsf{update}(E, x, n) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assignErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(funDef)} \UnaryInfC{$( \ S \ , \ E \ , \ \mathsf{def} \ F \ \mathsf{begin} \ Q \ \mathsf{end} \ P \ ) \longrightarrow ( \ S \ , \ \mathsf{update}(E, F, Q) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, F) = Q$} \RightLabel{(call)} \UnaryInfC{$( \ S \ , \ E \ , \ \mathsf{call} \ F \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, F) = \bot \quad$ or $\quad \mathsf{fetch}(E, F) \in \mathbb Z$} \RightLabel{(callErr)} \UnaryInfC{$( \ S \ , \ E \ , \ \mathsf{call} \ F \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree}

Example execution:

( ∅                 , []                   , def F begin dup add end push 6 assign X push 5 call F lookup X call F ) ⟶
( ∅                 , [F ↦ dup add]        , push 6 assign X push 5 call F lookup X call F) ⟶
( 6 :: ∅            , [F ↦ dup add]        , assign X push 5 call F lookup X callF) ⟶
( ∅                 , [X ↦ 6, F ↦ dup add] , push 5 call F lookup X call F ) ⟶
( 5 :: ∅            , [X ↦ 6, F ↦ dup add] , call F lookup X call F ) ⟶
( 5 :: ∅            , [X ↦ 6, F ↦ dup add] , dup add lookup X call F ) ⟶
( 5 :: 5 :: ∅       , [X ↦ 6, F ↦ dup add] , add lookup X call F ) ⟶
( 10 :: ∅           , [X ↦ 6, F ↦ dup add] , lookup X call F ) ⟶
( 6 :: 10 :: ∅      , [X ↦ 6, F ↦ dup add] , call F ) ⟶
( 6 :: 10 :: ∅      , [X ↦ 6, F ↦ dup add] , dup add ) ⟶
( 6 :: 6 :: 10 :: ∅ , [X ↦ 6, F ↦ dup add] , add ) ⟶
( 12 :: 10 :: ∅     , [X ↦ 6, F ↦ dup add] , ϵ ) ✓

6.1.1. Exercises

  1. Implement the function SWAP in the stack-oriented language with subroutines which swaps the top two values on the stack, assuming there are two values on the top of the stack. Also implement the function ROT which rotates to the top three elements of the stack, i.e., x :: y :: z :: S becomes z :: x :: y :: S.
  2. Implement MUL which multiplies the top two elements of the stack.

6.2. Parameter Passing

Parameter passing is part of the evaluation strategy of a programming language. It refers to how parameters are passed to a function when it is called. There are two forms of parameter passing we primarily considered in the course:

  • call-by-value. When a function is called, its arguments are evaluated and the function is given the values of its arguments. The important feature of the call-by-value strategy is that the arguments of a function are evaluated exactly once.
  • call-by-name. when a function is called, it is given the names of its arguments, to which a function can refer. The important feature of the call-by-name strategy is that, arguments are only evaluated if they are used; if a function never uses an argument, then that argument is never evaluated.

The details of the call-by-name and call-by-value strategies are more subtle in the context of imperative languages with mutable variable bindings. These strategies are more easily understood in the context of functional languages, as in the following examples.

Note that both of these example use the substitution model for their operational semantics.

6.2.1. A Functional Language with Call-by-Name Parameter Passing

Grammar for a simple functional language:

<expr>  ::= <value>
          | <expr> <expr>
          | <expr> (+ | -) <expr>
          | '(' <expr> ')'
<value> ::= <num> | <ident>
          | fun <ident> -> <expr>
<num>   ::= ( 0 | ... | 9 ) { 0 | ... | 9 }
<ident> ::= ( a | ... | z ) { a | ... | z }

Example program:

(fun x -> fun y -> x)
  2
  ((fun z -> z + 5) 4)

Operational semantics for the lambda calculus with call-by-name parameter passing and left-to-right evaluation:

\begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(appRed)} \UnaryInfC{$e_1 \ e_2 \longrightarrow e_1' \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{} \RightLabel{(cbnBeta)} \UnaryInfC{$(\texttt{fun} \ x \ \texttt{->} \ e_1) \ e_2 \longrightarrow e_1[e_2 / x]$} \end{prooftree} \end{equation*} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(addRedLeft)} \UnaryInfC{$e_1 \ \texttt{+} \ e_2 \longrightarrow e_1' \ \texttt{+} \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$v \in \mathbb Z$} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(addRedRight)} \BinaryInfC{$v \ \texttt{+} \ e_2 \longrightarrow v \ \texttt{+} \ e_2'$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(addNum)} \BinaryInfC{$m \ \texttt{+} \ n \longrightarrow m + n$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(subRedLeft)} \UnaryInfC{$e_1 \ \texttt{-} \ e_2 \longrightarrow e_1' \ \texttt{-} \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$v \in \mathbb Z$} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(subRedRight)} \BinaryInfC{$v \ \texttt{-} \ e_2 \longrightarrow v \ \texttt{-} \ e_2'$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(subNum)} \BinaryInfC{$m \ \texttt{-} \ n \longrightarrow m - n$} \end{prooftree}

Example evaluation: Note that the "second" argument is never evaluated.

(fun x -> fun y -> x) 2  ((fun z -> z + 5) 4) ⟶

(fun y -> x) [2 / x]     ((fun z -> z + 5) 4) ≡

(fun y -> 2)             ((fun z -> z + 5) 4) ⟶

2 [(fun z -> z + 5) 4 / y]                    ≡

2

6.2.2. Functional language with Call-by-Value Parameter Passing

Operational semantics for a simple functional language with call-by-value parameter passing and left-to-right evaluation:

\begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(appRedLeft)} \UnaryInfC{$e_1 \ e_2 \longrightarrow e_1' \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(appRedRight)} \UnaryInfC{$(\texttt{fun} \ x \ \texttt{->} e_1) \ e_2 \longrightarrow (\texttt{fun} \ x \ \texttt{->} e_1) \ e_2'$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{$v$ is a value} \RightLabel{(cbvBeta)} \UnaryInfC{$(\texttt{fun} \ x \ \texttt{->} \ e) \ v \longrightarrow e[v / x]$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(addRedLeft)} \UnaryInfC{$e_1 \ \texttt{+} \ e_2 \longrightarrow e_1' \ \texttt{+} \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$v \in \mathbb Z$} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(addRedRight)} \BinaryInfC{$v \ \texttt{+} \ e_2 \longrightarrow v \ \texttt{+} \ e_2'$} \end{prooftree} \end{equation*}

p

\begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(addNum)} \BinaryInfC{$m \ \texttt{+} \ n \longrightarrow m + n$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(subRedLeft)} \UnaryInfC{$e_1 \ \texttt{-} \ e_2 \longrightarrow e_1' \ \texttt{-} \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$v \in \mathbb Z$} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(subRedRight)} \BinaryInfC{$v \ \texttt{-} \ e_2 \longrightarrow v \ \texttt{-} \ e_2'$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(subNum)} \BinaryInfC{$m \ \texttt{-} \ n \longrightarrow m - n$} \end{prooftree}

Example evaluation: Note that the "second" argument is evaluated before it passed to the function (i.e., before the function is applied to it).

(fun x -> fun y -> x) 2  ((fun z -> z + 5) 4) ⟶

(fun y -> x) [2 / x]     ((fun z -> z + 5) 4) ≡

(fun y -> 2)             ((fun z -> z + 5) 4) ⟶

(fun y -> 2)             ((z + 5) [4 / z])    ≡

(fun y -> 2)             (4 + 5)              ⟶

(fun y -> 2)             9                    ⟶

2 [ 9 / y ]                                   ≡

2

6.2.3. Exercises

  1. Give an example of a program which does not terminate using call-by-value parameter passing, but does terminate using call-by-name parameter passing.
  2. Give an example of a program which will take ~4 times as many reductions to terminate using call-by-name parameter passing, as compared to call-by-value parameter passing.
  3. Write a program that does not terminate using call-by-name parameter passing.
  4. Determine what the following program evaluates to.

         (fun f -> fun x -> f x x)
           (fun y -> fun z -> fun a -> y (z a))
    p       (fun x -> x + 1) 3
    
  5. We can represents expressions in the language as an ADT. Implement following function

    type expr
      = App of expr * expr
      | Add of expr * expr
      | Sub of expr * expr
      | Num of int
      | Var of string
      | Fun of string * expr
    
    let is_value (e : expr) : bool =
      assert false (* TODO *)
    

    which returns true if e is a value (i.e., it is a number, a variable, or a function), and false otherwise.

  6. You may notice that our language has no error configuration. We say that an expression is stuck if it is not a value but it cannot be further reduced. Implement the function

    let is_stuck (e : expr) : bool =
      assert false (* TODO *)
    

    which returns true if e is stuck and false otherwise.

7. Dynamic Scoping

Dynamic scoping refers to the use of computational (temporal) context to determine when a binding is available. In its simplest form, we may think of all bindings as globally available as soon as they have been instantiated in the process of executing a program.

In a language with dynamic scoping, when it comes to determining what bindings are available to a subroutine, it doesn't matter where a subroutine is defined, but rather where it is called.

Bash is a widely used language with dynamic scoping, but in general, dynamic scoping is not common in modern programming languages. It is, however, much easier to implement than lexical scoping.

The stack-oriented language with subroutines given above implements dynamic scoping by fiat.

7.1. Examples

Example bash program highlighting dynamic scoping:

function f () {
  X=2
}
function g () {
  echo $X
}

X=1
g
f
g

# prints:
# 1
# 2

Example program in the stack-oriented language highlighting dynamic scoping:

def F begin
  push 2 assign X
end

def G begin
  lookup X
end

push 1 assign X
call G
call F
call G

Example evaluation:

( ∅           , []                                         , def F begin push 2 assign x end def G begin lookup X end push 1 assign X call G call F call G) ⟶
( ∅           , [F ↦ push 2 assign X]                      , def G begin lookup X push 1 assign X call G call F call G) ⟶
( ∅           , [G ↦ lookup X; F ↦ push 2 assign X]        , push 1 assign X call G call F call G) ⟶
( 1 :: ∅      , [G ↦ lookup X; F ↦ push 2 assign X]        , assign X call G call F call G) ⟶
( ∅           , [X ↦ 1; G ↦ lookup X; F ↦ push 2 assign X] , call G call F call G) ⟶
( ∅           , [X ↦ 1; G ↦ lookup X; F ↦ push 2 assign X] , lookup X call F call G) ⟶
( 1 :: ∅      , [X ↦ 1; G ↦ lookup X; F ↦ push 2 assign X] , call F call G) ⟶
( 1 :: ∅      , [X ↦ 1; G ↦ lookup X; F ↦ push 2 assign X] , push 2 assign X call G) ⟶
( 2 :: 1 :: ∅ , [X ↦ 1; G ↦ lookup X; F ↦ push 2 assign X] , assign X call G) ⟶
( 1 :: ∅      , [X ↦ 2; G ↦ lookup X; F ↦ push 2 assign X] , call G) ⟶
( 1 :: ∅      , [X ↦ 2; G ↦ lookup X; F ↦ push 2 assign X] , lookup X) ⟶
( 2 :: 1 :: ∅ , [X ↦ 2; G ↦ lookup X; F ↦ push 2 assign X] , ϵ ) ✓

7.1.1. Exercises

  1. Give an evaluation of the following program.

    def F begin
      deg G begin
        lookup X
      end
    end
    
    push 1 assign X
    call F
    call G
    

8. Lexical Scoping

Lexical scoping refers to the use of properties of the source code text to deliniate the availability of a binding. In Python, the body of a function, determined by indentation, deliniates function scope. In OCaml, The body of a let-expression is exactly the scope of its associated binding (it is not possible to refer to that binding outside of the body of the let-expression). Lexical scoping is common among modern programming languages (including both Python and OCaml).

In a language with lexical scoping, when it comes to determining what bindings are available to a subroutine, it doesn't matter where a subroutine was called, but rather where it was defined (this is the opposite of dynamic scoping).

The implementation lexical scoping depends on a couple considerations:

  • Are variables mutable?
  • Are functions higher-order?

8.1. Activation Records

Lexical scoping with mutable variables requires maintaining a call stack. The call stack keeps track of the function calls made during the evaluating a program, along with local variables to function calls.

Each function call (transferring control from the caller to the callee) creates an activation record (or frame) which keeps track of:

  • the input parameters of the function
  • a pointer to the activation record in which the function was defined
  • a pointer to where to return after completing the function call

Returning from a subroutine (transferring control from the callee to the caller) removes its associated frame from the stack.

At a low level, we can think of an activation record as having the following rough structure.

TOP OF THE STACK
+--------------+
| localK       |
| ...          |
| local2       |
| local1       |
| return_addr  |
| def_addr     |
| argN         |
| ...          |
| arg2         |
| arg1         |
+--------------+
↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓

Conceptually, we represent the call stack within the environment. That is, rather than maintaining a collection of bindings in the environment, we maintain a stack of activation records. In OCaml, we might implement this as follows.9 Note that this implementation has our stack-oriented languages in mind so there are no parameters in a record.

IMPORTANT. This works slightly differently than in project 2. Note that the fetch operation returns the identifier of the record in which it found a binding.

type command = unit (* DUMMY TYPE *)
type program = command list
type value
  = Num of int
  | Prog of program
type record =
  { id : int
  ; locals : (string * value) list
  ; ret_prog : program
  ; def_pointer : int
  }

type env = record list

let empty_env =
  [ { id = 0
    ; locals = []
    ; def_pointer = -1
    ; ret_prog = []
    }
  ]

let fetch (e : env) (x : string) : (value * int) option =
  let rec go e x i =
    match e with
    | r :: e ->
      if r.id = i
      then
        match List.assoc_opt x r.locals with
        | Some v -> Some (v, i)
        | None -> go e x r.def_pointer
      else go e x i
    | _ -> assert false (* impossible *)
  in
  match e with
  | r :: _ -> go e x r.id
  | _ -> assert false (* impossible *)

let update (e : env) (x : string) (v : value) : env =
  match e with
  | r :: e -> { r with locals = (x, v) :: r.locals } :: e
  | [] -> assert false (* impossible *)

8.1.1. Stack-Oriented Language with Mutable Variables and Lexical Scoping

Lexical scoping becomes more subtle to implement if functions are higher-order. In this example, we do not allow higher-order functions, but we do allow nested function definitions.

In the operational semantics for the following stack-oriented language, we take a configuration to be a program (\(P\)) together with a stack of integers (\(S\)) and an call stack (\(E\)) which is a stack of records as describe above. We use

\begin{equation*} \langle i , L, R, j \rangle \end{equation*}

to denote an activation record with identifier \(i\), local bindings \(L\), return program \(R\), and pointer \(j\) to the defining activation record.

This is very similar to what was done in project 2.


Operational Semantics for a stack-oriented language with lexical scoping and mutable variables:

\begin{prooftree} \AxiomC{} \RightLabel{(push)} \UnaryInfC{$(\ S \ , \ E \ , \ \textsf{push n} \ P \ ) \longrightarrow ( \ n :: S \ , \ E \ ,\ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dup)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow ( \ n :: n :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dupErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(add)} \UnaryInfC{$( \ m :: n :: S \ , \ E \ , \ \textsf{add} \ P \ ) \longrightarrow ( \ (m + n) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(addErr1)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(addErr0)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(sub)} \UnaryInfC{$( \ m :: n :: S \ , \ E \ , \ \textsf{sub} \ P \ ) \longrightarrow ( \ (m - n) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(subErr1)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(subErr0)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifFalse)} \UnaryInfC{$( \ 0 :: S \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q_2 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$n \not = 0$} \RightLabel{(ifTrue)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q_1 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x)$ is a number} \RightLabel{(lookup)} \UnaryInfC{$( \ S \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow ( \ \mathsf{fetch}(E, x) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x)$ is not a number} \RightLabel{(lookupErr)} \UnaryInfC{$( \ S \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assign)} \UnaryInfC{$( \ m :: S \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow ( \ S \ , \ \mathsf{update}(E, x, m) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assignErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(funDef)} \UnaryInfC{$( \ S \ , \ E \ , \ \mathsf{def} \ F \ \mathsf{begin} \ Q \ \mathsf{end} \ P \ ) \longrightarrow ( \ S \ , \ \mathsf{update}(E, F, Q) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, F) = (Q, i)$ and $Q$ is a program} \RightLabel{(call)} \UnaryInfC{$( \ S \ , E \ , \ \mathsf{call} \ F \ P \ ) \longrightarrow ( \ S \ , \ \langle \mathsf{len}(E), \varnothing, P, i \rangle :: E \ , \ Q \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(return)} \UnaryInfC{$( \ S \ , \ \langle i, L, R, j \rangle :: E \ , \ \epsilon \ ) \longrightarrow ( \ S \ , \ E \ , \ R \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, F)$ is not a program} \RightLabel{(callErr)} \UnaryInfC{$( \ S \ , \ E \ , \ \mathsf{call} \ F \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree}

Example program:

push 2 assign X

def F begin
  lookup X
  push 4 assign X
end

call F
push 3 assign X
call F

Example evaluation:

( ∅
, ⟨0 , [] , ϵ , -1 ⟩ ::
  ∅
, push 2 assign X def F begin lookup X push 4 assign X end call F push 3 assign X call F
) ⟶

( 2 :: ∅
, ⟨0 , [] , ϵ , -1 ⟩ ::
  ∅
, assign X def F begin lookup X push 4 assign X end call F push 3 assign X call F
) ⟶

( ∅
, ⟨ 0 , [X ↦ 2] , ϵ , -1 ⟩ ::
  ∅
, def F begin lookup X push 4 assign X end call F push 3 assign X call F
) ⟶

( ∅
, ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ ::
  ∅
, call F push 3 assign X call F
) ⟶

( ∅
, ⟨ 1 , [] , push 3 assign X call F , 0 ⟩ ::
  ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ ::
  ∅
, lookup X push 4 assign X
) ⟶

( 2 :: ∅
, ⟨ 1 , [] , push 3 assign X call F , 0 ⟩ ::
  ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ ::
  ∅
, push 4 assign X
) ⟶

( 4 :: 2 :: ∅
, ⟨ 1 , [] , push 3 assign X call F , 0 ⟩ ::
  ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ ::
  ∅
, assign X
) ⟶

( 2 :: ∅
, ⟨ 1 , [X ↦ 4] , push 3 assign X call F , 0 ⟩ ::
  ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ ::
  ∅
, ϵ
) ⟶

( 2 :: ∅
, ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ ::
  ∅
, push 3 assign X call F
) ⟶

( 3 :: 2 :: ∅
, ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ ::
  ∅
, assign X call F
) ⟶

( 2 :: ∅
, ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 3] , ϵ , -1 ⟩ ::
  ∅
, call F
) ⟶

( 2 :: ∅
, ⟨ 1 , [] , ϵ , 0 ⟩ ::
  ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 3] , ϵ , -1 ⟩ ::
  ∅
, lookup X push 4 assign X
) ⟶

( 3 :: 2 :: ∅
, ⟨ 1 , [] , ϵ , 0 ⟩ ::
  ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 3] , ϵ , -1 ⟩ ::
  ∅
, push 4 assign X
) ⟶

( 4 :: 3 :: 2 :: ∅
, ⟨ 1 , [] , ϵ , 0 ⟩ ::
  ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 3] , ϵ , -1 ⟩ ::
  ∅
, assign X
) ⟶

( 3 :: 2 :: ∅
, ⟨ 1 , [X ↦ 4] , ϵ , 0 ⟩ ::
  ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 3] , ϵ , -1 ⟩ ::
  ∅
, ϵ
) ⟶

( 3 :: 2 :: ∅
, ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 3] , ϵ , -1 ⟩ ::
  ∅
, ϵ
) ⟶

( 3 :: 2 :: ∅
, ∅
, ϵ
) ✓

8.1.2. Exercises

  1. Give an evaluation of the following program.

    push 2 assign X
    def F begin
      push 3 assign X
      call G
    end
    def G begin
      lookup X
    end
    call F
    
  2. Implement the function

    let global_update (e : env) (x : string) (v : value) =
      assert false (* TODO *)
    

    which always updates the binding in the global frame (as opposed to the topmost frame).

  3. Extend the language and the operational semantics to include a commands gassign which assigns variables in the global frame, even within a subroutine.

8.2. Closures

Closures are subroutines together with data they need to be executed correctly. At a minimum, this data consists of a collection of captured bindings, i.e., bindings which may no longer be in scope or may be shadowed when the function is called. It often also includes the name given to the subroutine, as this is important for implementing recursive functions.

Closures can be used to implement higher-order functions and lexically scoped immutable variable bindings. If variable bindings are immutable, a subroutine can be defined as a closure which stores all the variable bindings it that exist when it is defined.

Closure are also used in continuation-passing style implementations of subroutines. We can save a closure containing the current environment and (a pointer to) the return program. This closure is often called the current continuation.

To simplify things, we can keep track of two stacks, one for data and the other for holding continuations for returning from a function.10

type closure =
  { name: string
  ; captured: (string * value) list
  ; body : program
  }

and value
  = Closure of closure
  | Num of int

type env = (string * value) list
type ccs = closure list

8.2.1. Stack-Oriented Language with Lexical Scoping and Higher-Order Functions

In the operational semantics of the following stack oriented language, we take a configuration to be a program (\(P\)) together with a stack of values (\(S\)) which includes integers and closures, a stack continuations (\(C\)) used for returning from a function call, and an environment (\(E\)), represented as a list of bindings (as in the case of our stack-oriented language with dynamic scoping). We use

\begin{equation*} \langle F, L, P \rangle \end{equation*}

to denote a closure with name \(F\), captured bindings \(L\), and body \(P\).

Note that we need to be able to put closures on the stack in order to implement higher-order functions.


Operational Semantics of a stack-oriented language with lexical scoping, immutable variables, and higher-order functions:

\begin{prooftree} \AxiomC{} \RightLabel{(push)} \UnaryInfC{$(\ S \ , \ C \ , \ E \ , \ \textsf{push n} \ P \ ) \longrightarrow ( \ n :: S \ , \ C \ , \ E \ ,\ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dup)} \UnaryInfC{$( \ n :: S \ , \ C \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow ( \ n :: n :: S \ , \ C \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dupErr)} \UnaryInfC{$( \ \varnothing \ , \ C \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(add)} \UnaryInfC{$( \ m :: n :: S \ , \ C \ , \ E \ , \ \textsf{add} \ P \ ) \longrightarrow ( \ (m + n) :: S \ , \ C \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(addErr1)} \UnaryInfC{$( \ n :: S \ , \ C \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(addErr0)} \UnaryInfC{$( \ \varnothing \ , \ C \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(sub)} \UnaryInfC{$( \ m :: n :: S \ , \ C \ , \ E \ , \ \textsf{sub} \ P \ ) \longrightarrow ( \ (m - n) :: S \ , \ C \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(subErr1)} \UnaryInfC{$( \ n :: S \ , \ C \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(subErr0)} \UnaryInfC{$( \ \varnothing \ , \ C \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifFalse)} \UnaryInfC{$( \ 0 :: S \ , \ C \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ C \ , \ E \ , \ Q_2 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$n \not = 0$} \RightLabel{(ifTrue)} \UnaryInfC{$( \ n :: S \ , \ C \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ C \ , \ E \ , \ Q_1 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x) \not = \bot$} \RightLabel{(lookup)} \UnaryInfC{$( \ S \ , \ C \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow ( \ \mathsf{fetch}(E, x) :: S \ , \ C \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x) = \bot$} \RightLabel{(lookupErr)} \UnaryInfC{$( \ S \ , \ C \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assign)} \UnaryInfC{$( \ n :: S \ , \ C \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow ( \ S \ , \ C \ , \ \mathsf{update}(E, x, n) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assignErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(funDef)} \UnaryInfC{$( \ S \ , \ C \ , \ E \ , \ \mathsf{def} \ F \ \mathsf{begin} \ Q \ \mathsf{end} \ P \ ) \longrightarrow ( \ S \ , \ C \ , \ \mathsf{update}(E, F, \langle F, E, Q\rangle) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, F) = \langle G, L, Q\rangle$} \RightLabel{(call)} \UnaryInfC{$( \ S \ , \ C \ , \ E \ , \ \mathsf{call} \ F \ P \ ) \longrightarrow ( \ S \ , \ \langle\mathsf{cc}, E, P\rangle :: C \ , \ \mathsf{update}(L, G, Q) \ , \ Q \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(return)} \UnaryInfC{$( \ S \ , \ \langle\mathsf{cc}, E, P \rangle :: C \ , \ L \ , \ \epsilon \ ) \longrightarrow ( \ S \ , \ C \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, F) = \bot \quad$ or $\quad \mathsf{fetch}(E, F) \in \mathbb Z$} \RightLabel{(callErr)} \UnaryInfC{$( \ S \ , \ C \ , \ E \ , \ \mathsf{call} \ F \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree}

Example evaluation (of the same program as in the previous section:

( ∅
, ∅
, []
, push 2 assign X def F begin lookup X push 4 assign X end call F push 3 assign X call F
) ⟶

( 2 :: ∅
, ∅
, []
, assign X def F begin lookup X push 4 assign X end call F push 3 assign X call F
) ⟶

( ∅
, ∅
, [ X ↦ 2 ]
, def F begin lookup X push 4 assign X end call F push 3 assign X call F
) ⟶

( ∅
, ∅
, [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩
  ; X ↦ 2
  ]
, call F push 3 assign X call F
) ⟶

( ∅
, ⟨ cc , push 3 assign X call F , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ::
  ∅
, [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩
  ; X ↦ 2
  ]
, lookup X push 4 assign X
) ⟶

( 2 :: ∅
, ⟨ cc , push 3 assign X call F , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] ⟩ ::
  ∅
, [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩
  ; X ↦ 2
  ]
, push 4 assign X
) ⟶

( 4 :: 2 :: ∅
, ⟨ cc , push 3 assign X call F , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] ⟩ ::
  ∅
, [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩
  ; X ↦ 2
  ]
, assign X
) ⟶

( 2 :: ∅
, ⟨ cc , push 3 assign X call F , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] ⟩ ::
  ∅
, [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩
  ; X ↦ 4
  ]
, ϵ
) ⟶

( 2 :: ∅
,  ∅
, [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩
  ; X ↦ 2
  ]
, push 3 assign X call F
) ⟶

( 3 :: 2 :: ∅
,  ∅
, [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩
  ; X ↦ 2
  ]
, assign X call F
) ⟶

( 2 :: ∅
,  ∅
, [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩
  ; X ↦ 3
  ]
, call F
) ⟶

( 2 :: ∅
, ⟨ cc , ϵ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 3 ] ⟩ ::
  ∅
, [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩
  ; X ↦ 2
  ]
, lookup X push 4 assign X
) ⟶

( 2 :: 2 :: ∅
, ⟨ cc , ϵ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 3 ] ⟩ ::
  ∅
, [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩
  ; X ↦ 2
  ]
, push 4 assign X
) ⟶

( 4 :: 2 :: 2 :: ∅
, ⟨ cc , ϵ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 3 ] ⟩ ::
  ∅
, [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩
  ; X ↦ 2
  ]
, assign X
) ⟶

( 2 :: 2 :: ∅
, ⟨ cc , ϵ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 3 ] ⟩ ::
  ∅
, [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩
  ; X ↦ 4
  ]
, ϵ
) ⟶

( 2 :: 2 :: ∅
,  ∅
, [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩
  ; X ↦ 3
  ]
, ϵ
) ✓

Example program with higher-order functions:

def K begin
  assign X
  def F begin
    assign Y
    lookup X
  end
  lookup F
end

push 3 call K assign THREE
push 2 call THREE

Example evaluation with higher-order functions:

( ∅
, ∅
, []
, def K begin assign X def F begin assign Y lookup X end lookup F end push 3 call K assign THREE push 2 call THREE
) ⟶

( ∅
, ∅
, [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ]
, push 3 call K assign THREE push 2 call THREE
) ⟶

( 3 :: ∅
, ∅
, [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ]
, call K assign THREE push 2 call THREE
) ⟶

( 3 :: ∅
, ⟨ cc , assign THREE push 2 call THREE , [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] ⟩ ::
  ∅
, [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ]
, assign X def F begin assign Y lookup X end lookup F
) ⟶

( ∅
, ⟨ cc , assign THREE push 2 call THREE , [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] ⟩ ::
  ∅
, [ X ↦ 3
  ; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩
  ]
, def F begin assign Y lookup X end lookup F
) ⟶

( ∅
, ⟨ cc , assign THREE push 2 call THREE , [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] ⟩ ::
  ∅
, [ F ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩
  ; X ↦ 3
  ; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩
  ]
, lookup F
) ⟶

( ⟨ F , assign Y lookup X , [X ↦ 3] ⟩ :: ∅
, ⟨ cc , assign THREE push 2 call THREE , [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] ⟩ ::
  ∅
, [ F ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩
  ; X ↦ 3
  ; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩
  ]
, ϵ
) ⟶

( ⟨ F , assign Y lookup X , [X ↦ 3] ⟩ :: ∅
,  ∅
, [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ]
, assign THREE push 2 call THREE
) ⟶

( ∅
, ∅
, [ THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩
  ; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩
  ]
, push 2 call THREE
) ⟶

( 2 :: ∅
, ∅
, [ THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩
  ; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩
  ]
, call THREE
) ⟶

( 2 :: ∅
, ⟨ cc , ϵ , [THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩] ⟩ ::
  ∅
, [ THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩
  ; X ↦ 3
  ]
, assign Y lookup X
) ⟶

( ∅
, ⟨ cc , ϵ , [THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩] ⟩ ::
  ∅
, [ Y ↦ 2
  ; THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩
  ; X ↦ 3
  ]
, lookup X
) ⟶

( 3 :: ∅
, ⟨ cc , ϵ , [THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩] ⟩ ::
  ∅
, [ Y ↦ 2
  ; THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩
  ; X ↦ 3
  ]
, ϵ
) ⟶

( 3 :: ∅
,  ∅
, [ THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩
  ; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩
  ]
, ϵ
) ✓

8.2.2. Exercises

  1. Evaluate the following program.

    def ADDFIVE begin
      push 5 add
    end
    
    def TWICE begin
      assign F
      call F
      call F
    end
    
    push 0
    lookup ADDFIVE
    call TWICE
    
  2. Write a program which terminates for the stack-oriented languages with

    • dynamic scoping
    • lexical scoping via activation records
    • lexical scoping via closures

    and leaves one number on the stack, but which leaves a different number in each of the three cases.

  3. Redefine the language and semantics to include a return command (as in project 3).
  4. Redefine the language and semantics to include a call command (as in project 3) so that the language has anonymous functions.
  5. (Open ended) Describe the benefit of having two stacks (one for data and one for returns) instead of just one.

Footnotes:

1

This is, of course, not the only way to achieve this. We can also use continuation-passing style, as we saw briefly in lecture.

2

You are not required to write formal typing derivations, but you should be able to use your intuitions about OCaml to determine the type of an expression based on its structure (as the OCaml type-checker does mechanically).

3

In formal language theory, a grammar is a more general than this.

4

The exclamation points are for emphasis, they are not a part of the derivation.

5

Other source may include more constructs in EBNF meta-syntax. See, for example, the Wikipedia page.

6

It is a theorem of formal language theory that regular grammars and regular expressions are equivalent in expressivity.

7

There is an efficient parsing algorithm for grammars in Chomsky normal form. This is the main primary we consider grammars of this form.

8

In the remainder of this document, we will continue this trend. Each following stack-oriented language is a variation on a theme.

9

This is not the most efficient implementation.

10

This is what the language Forth does.