Final Review
Table of Contents
This page contains an outline of the topics covered in CAS CS 320: Principles of Programming Languages during the Spring 2023 semester at Boston University. It is not exhaustive, though it is biased towards those topics which are pertinent for the final exam of the course. Included are exercises associated with each topic.
1. OCaml
During the first half of the course, we learned the function programming language OCaml. Our primary goal was to learn how to program in the functional style, not thinking of a function as defining a list of commands, but instead as specifying the shape of its output.
1.1. Basics
OCaml has many types that are standard among programming languages, e.g., integers, Boolean values, floating-point numbers, tuples and lists, all with basic operators. It also has standard functional language constructs like let-bindings, anonymous functions, and if-then-else expressions.
1.1.1. Exercises
Euclid's algorithm for determining the greatest common divisor of two integers takes advantage of the fact that \(\mathsf{gcd}(m, n) = \mathsf{gcd}(n, m \bmod n)\). Implement the function
let gcd (m : int) (n : int) : int = assert false (* TODO *)
which, given two integers
m
andn
, returns their greatest common divisor.We can represent a rational number as a pair of integers where the first integer represents the numerator and the second represents the denominator. We maintain the invariant that the second number is positive and the pair of numbers are relatively prime. Implement the function
add
which adds two rational numbers, making sure to maintain this invariant.type rat = int * int let add (m : rat) (n : rat) : rat = assert false (* TODO *)
Implement the function
let gen_fib : (l : int list) (n : int) : int = if n < 0 then assert false else assert false (* TODO *)
which, given a list of integers
\begin{equation*} F_n = \begin{cases} l[n] & n < k \\ \sum_{i = 1}^k F_{n - i} & n \geq k \end{cases} \end{equation*}l
of length \(k\) and a nonnegative integern
, returns the \(n^\text{th}\) element of the following sequence:
1.2. Algebraic Data Types
One of the most important features of modern functional programming languages is pattern matching and algebraic data types. An algebraic data type (ADT) is defined by giving a collection of constructors, which themselves can carry other data. We can think of algebraic data types as defining unions of existing types. Here, for example, is an ADT whose values represent either a Boolean value or an integer.
type bool_or_int = Bool of bool | Int of int
We work with values of an ADT by pattern matching, providing different output values depending on the shape of the value we are matching with.
let num_val (x : bool_or_int) : int = match x with | Bool b -> if b then 1 else 0 | Int i -> i
ADTs can be recursive (the type being defined can be referred to in
its own definition) and parametric (the type being defined can
depend parametrically on another type). These two features are
encompassed in the list
type.
type 'a mylist = Nil | Cons of 'a * 'a mylist
'a option
and 'a result
two other parametric types which are
important to be familiar with.
1.2.1. Exercises
Implement the function
let matrix_of_list (l : 'a list) (num_cols : int) : ('a list list) option = assert false (* TODO *)
which converts a list
l
into a matrix withnum_cols
columns, returingNone
in the case thatnum_cols
is not positive or the resulting matrix is not rectangular (i.e., the length ofl
is not a multiple ofnum_cols
).Implement the function
let drop_nones (l : 'a option list) : 'a list = assert false (* TODO *)
which returns an
'a list
consisting of those elements ofl
which are notNone
(in order).A Red-Black tree is an ordered binary tree in which every node is labeled either red or black. It is further required that
- no red node has a red child
- every path from the root node to the leaf has the same number of black nodes
Fill in the function
is_valid
below which returnstrue
ift
is a valid red-black tree, andfalse
otherwise.type color = Red | Black type 'a rbtree = Leaf | Node of (color * 'a * 'a rbtree * 'a rbtree) let is_valid (t : 'a rbtree) : bool = assert false (* TODO *)
1.3. Record Types
Record types are essentially tuples with named fields. Here is an example of a representation of rational numbers using record types.
type rat = { numer : int ; denom : int ; is_positive : bool } let two_thirds = { numer = 2 ; denom = 3 ; is_positive = true }
If we think of algebraic data types as unions of existing types, then we may think of record types as products of existing types. Beyond this, they have a couple conveniences that are good to remember.
- Accessing fields in a record can be done via dot notation, i.e., if
r
is arat
thenr.is_positive
is abool
. Updating a fields records can be done using
with
-notation. This is useful if a record has many fields but only a few need to be updated.let negate (r : rat) : rat = { r with is_positive = not r.is_positive } let recip (r : rat) : rat = if r.numer = 0 then assert false else { r with numer = r.denom; denom = r.numer }
1.3.1. Exercises
Fill in the record types below so that the given function type-checks.
type rectangle1 = (* TODO *) type rectangle2 = (* TODO *) let transform (r : rectangle1) : rectangle2 = let (x, y) = r.center in { bottom_left = x -. r.width /. 2., y -. r.height /. 2. ; top_right = x +. r.width /. 2., y +. r.height /. 2. }
Write a function which adds a binding to the list of captured bindings of a closure, ensuring that it is shadowed by any bindings already in the collection of captured bindings.
type value = unit (* DUMMY TYPE *) type program = unit (* DUMMY TYPE *) type closure = { name : string ; body : program ; captured : (string * value) list } let add_binding (c : closure) (x : string) (v : value) = assert false (* TODO *)
Suppose you are given a list of tools for converting sound files from one format to another. Each converter has a name, a list of input formats which it can convert from, and a list of output formats which it can convert to. Implement the function
convert_options
which, given a list of converterscs
and an input formatf
, collects the possible output formats, keeping track of the names of the converter tools which can be used.type converter = { name: string ; input_formats : string list ; output_formats : string list } type convert_out = { converters : string list ; output_format : string } let convert_options (cs : converter list) (f : string) : convert_out list = assert false (* TODO *)
1.4. Higher-Order Programming
Higher-order programming is the use of functions as first-class values to write general, reusable code. There are three patterns in particular for higher-order programming with lists which we looked at in depth.
The function
map
, defined aslet rec map (f : 'a -> 'b) (l : 'a list) : 'b list = match l with | [] -> [] | x :: l -> let x = f x in x :: map f l
replaces each element in
l
withf
applied to that element, in order from left to right.The function
filter
, defined aslet rec filter (p : 'a -> bool) (l : 'a list) : 'a list = match l with | [] -> [] | x :: l -> if p x then x :: filter p l else filter p l
collects all element of
l
which satisfy the predicatep
, in order from left to right.The function
fold_right
, defined aslet rec fold_right (f : 'a -> 'b -> 'b) (l : 'a list) (accu : 'b) : 'b = match l with | [] -> accu | x :: l -> f x (fold_right f l accu)
applies the binary operation
f
between every element ofl @ [accu]
right-associatively:[x₁; x₂; x₃; ... xₙ] ↓↓ ↓↓ ↓↓ ↓↓ f x₁ (f x₂ (f x₃ (...(f xₙ accu)...)))
and the function
fold_left
:let rec fold_left (f : 'b -> 'a -> 'b) (accu : 'b) (l : 'a list) : 'b = match l with | [] -> accu | x :: l -> fold_left f (f accu x) l
does the same but to
accu :: l
left-associatively:[x₁; x₂; x₃; ... xₙ] ↓↓ ↓↓ ↓↓ ↓↓ f (...(f (f (f accu x₁) x₂) x₃)...) xₙ
Note that
fold_left
is tail-recursive whereasfold_right
is not.
1.4.1. Exercises
Implement the functions
let andp (p1 : 'a -> bool) (p2 : 'a -> bool) : 'a -> bool = assert false (* TODO *) let orp (p1 : 'a -> bool) (p2 : 'a -> bool) : 'a -> bool = assert false (* TODO *)
with the following properties:
- given two predicates
p1
andp2
, the predicateandp p1 p2
is the predicate which expresses that bothp1
andp2
hold. - given two predicates
p1
andp2
, the predicateorp p1 p2
is the predicate which expresses thatp1
orp2
hold.
- given two predicates
We can represent a polynomial as a list of
float
's as follows.[a₀; a₁; a₂; ...; aₙ] ↓↓ ↓↓ ↓↓ ↓↓ p(x) = a₀ + a₁ x + a₂ x² + ... + aₙ xⁿ
Implement the function
let derivative (p : float list) : float list = assert false (* todo *)
which computes the list representing the polynomial \(p'(x)\), the derivative of \(p(x)\).
When implementing radix sort on integers, it is necessary to partition a list of integers based on their last digits. Fill in the following function
let bucket (l : int list) : int list list = let op accu next = assert false (* TODO *) in let base = List.init 10 (fun _ -> []) in List.fold_left op base l
which, given a list
l
of integers, return 10 lists of integers which partitionl
by the last digit the members ofl
. That is, the \(i^\text{th}\) element ofbucket l
should contain exactly the elements ofl
whose last digit is \(i\).
1.5. Tail Recursion
Roughly speaking, a recursive call in the body of a function
definition is in tail position if no evaluation is required after
the recursive call. The following implementation of the factorial
function is not tail recursive because it requires evaluating the
product of the result of its recursive call with the input n
.
let rec factorial (n : int) : int = if n < 0 then assert false else if n = 0 then 1 else n * factorial (n - 1)
We can make functions tail recursive by adding an accumulator argument to the function.1
let factorial_tail (n : int) : int = let rec go (n : int) (accu : int) = if n = 0 then accu else go (n - 1) (n * accu) in if n < 0 then assert false else go n 1
1.5.1. Exercises
Consider the following function which prints a (half) hourglass in asterisks. Implement a function which does the same thing but is tail recursive.
let rec hourglass (n : int) : unit = if n > 0 then let _ = print_endline (String.make n '*') in let _ = hourglass (n - 1) in let _ = print_endline (String.make n '*') in () else ()
(Challenge) Implement a tail recursive evaluator for Boolean expressions as represented by the following ADT.
type bool_expr = Bool of bool | Not of bool_expr | And of bool_expr * bool_expr | Or of bool_expr * bool_expr let eval_tr (e : bool_expr) : bool = assert false (* TODO *)
1.6. Type Checking
OCaml is strongly typed, and programs are statically checked for adherence to typing rules. We did not look at the type rules in full detail, but we did see some examples of formal typing rules in OCaml, e.g.,
\begin{equation*} \frac {b \in \mathsf{bool} \qquad e_1 \in \mathsf{t} \qquad e_2 \in \mathsf{t}} {(\textsf{if } b \textsf{ then } e_1 \textsf{ else } e_2) \in \mathsf{t}} \end{equation*}This rule expresses that an if-then-else expression has the same type as its if-case and else-case if its individual sub-expressions are well-typed.2
OCaml also has type inference, which means we often do not have to specify the types of expressions in OCaml programs (though it can be useful for documentation purposes).
1.6.1. Exercises
Does this program type-check? If so, what are the types of
bar
andbaz
?type 'a foo = Foo of ('a foo -> 'a) let bar (Foo f) = f let baz x = bar x x
Does this program type-check? If so, what is the type of
foo
?let rec foo x y = if x > 0 then foo (x - 1) (y +. 1.) else if x < 0 then foo y x else 0
2. Formal Grammar
Grammar is the study of the form and structure of language. We use concepts from the formal grammar to represent and reason about the syntax of programming languages. These concepts also inform the design of parsers.
2.1. BNF Specifications
We start with a collection of symbols, separated into two disjoint
groups, the nonterminal symbols and the terminal symbols. In a
Backus-Naur form (BNF) specification we use angle brackets (e.g.,
<nonterm>
) to denote a nonterminal symbol. We typically don't
specify the symbols in advance, but instead glean them from the
specification itself.
A sentential form is a sequences of symbols and a sentence is a sequence of terminal symbols.
A production rule is made up of a nonterminal symbol and a sentential form, and is written
<nonterm> ::= SENTFORM
We interpret a production rule as indicating that <nonterm>
stands
for SENTFORM
in a sentential from.
A BNF specification is given by a collection of production rules and a starting symbol. We typically take the nonterminal symbol in the first rule of the specification to be the starting symbol. We also call a BNF specification a grammar.3
In the case that a BNF specification has multiples rules for the same nonterminal symbol, we use the notation
<nonterm> ::= SENTFORM₁ | SENTFORM₂ | ... | SENTFORMₖ
as shorthand for
<nonterm> ::= SENTFORM₁ <nonterm> ::= SENTFORM₂ ... <nonterm> ::= SENTFORMₖ
A derivation of a sentential form \(S\) in a grammar \(\mathcal G\) with
start symbol <start>
is a sequence of sentential forms, starting
with the start symbol <start>
and ending in \(S\), in which each form
in the sequence (except for <start>
) is the results of replacing one
nonterminal symbol in the previous form with a sentential from given
by a production rule of \(\mathcal G\).
A derivation is leftmost if the nonterminal symbol replaced at each step of the derivation is the leftmost nonterminal symbol in the sentential form.
A grammar \(\mathcal G\) accepts or recognizes a sentence \(S\) if has a derivation in \(\mathcal G\).
A parse tree, informally, is a derivation represented as a tree, in which
- every node is labeled with a symbol
- the root is labeled with the starting symbol
- the children of each node are the symbols of the sentential form (in order) which replace the symbol labeling the node in the derivation.
The frontier of a parse tree (i.e., the leaf nodes in order from left to right) form the sentence derived in the derivation represented by the parse tree.
A parse tree may correspond to multiple derivations, but every derivation has a unique parse tree representation. Likewise, every parse tree corresponds to a unique leftmost derivation.
2.1.1. Examples
Toy grammar for English sentences:
<sentence> ::= <noun-phrase> <verb-phrase> <verb-phrase> ::= <verb> | <verb> <prep-phrase> <prep-phrase> ::= <prep> <noun-phrase> <noun-phrase> ::= <article> <noun> <article> ::= the <noun> ::= cow | moon <verb> ::= jumped <prep> ::= over
Derivation of a sentence recognized by the above grammar:4
<sentence>! <noun-phrase>! <verb-phrase> <noun-phrase> <verb> <prep-phrase>! <noun-phrase>! <verb> <prep> <noun-phrase> <article> <noun> <verb> <prep> <noun-phrase>! <article>! <noun> <verb> <prep> <article> <noun> the <noun>! <verb> <prep> <article> <noun> the cow <verb>! <prep> <article> <noun> the cow jumped <prep>! <article> <noun> the cow jumped over <article>! <noun> the cow jumped over the <noun>! the cow jumped over the moon
Toy grammar for an imperative programming language:
<program> ::= <stmts> <stmts> ::= <stmt> | <stmt> ; <stmts> <stmt> ::= <var> = <expr> <var> ::= a | b | c | d <expr> ::= <term> | <term> + <term> | <term> - <term> <term> ::= <var> | const
A (leftmost) derivation of a program recognized by the above grammar:
<program> <stmts> <stmt> ; <stmts> <var> = <expr> ; <stmts> a = <expr> ; <stmts> a = <term> ; <stmts> a = const ; <stmts> a = const ; <stmt> ; <stmts> a = const ; <var> = <expr> ; <stmts> a = const ; a = <expr> ; <stmts> a = const ; a = <term> + <term> ; <stmts> a = const ; a = <var> + <term> ; <stmts> a = const ; a = a + <term> ; <stmts> a = const ; a = a + const ; <stmts> a = const ; a = a + const ; <var> = <expr> a = const ; a = a + const ; b = <expr> a = const ; a = a + const ; b = <term> a = const ; a = a + const ; b = <var> a = const ; a = a + const ; b = a
2.1.2. Extended BNF
We extend the notation of BNF specifications to make it more convenient to use.5
[ SENTFORM₁ | SENTFORM₂ | ... | SENTFORMₖ ]
refers to an optional collection of alternatives of a sentential form. For example, we can represent an integer by the following specification:<int> ::= [ - ] <digits> <digits> ::= <digit> | <digit> <digits> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
( SENTFORM₁ | SENTFORM₂ | ... | SENTFORMₖ )
refers to a collection of alternatives within a sentential form. For example, we can represent arithmetic expressions by the following specification:<expr> ::= <expr> ( + | - | * | / ) <expr> | <digits> <digits> ::= <digit> | <digit> <digits> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
{ SENTFORM₁ | SENTFORM₂ | ... | SENTFORMₖ }
refers to zero or more occurrences of the sentential forms in a collection of alternatives. For example, we can simplify the specification for integers (and enforce that the first digit must be nonzero):<int> ::= [ - ] ( 1 | ... | 9) { 0 | ... | 9 }
2.1.3. Exercises
List the symbols (both terminal and nonterminal) implicit in the following specification.
<a> ::= a <b> | a <a> b <b> ::= c <a> | d
- Give a leftmost derivation of
a a a c a d b b
in the above grammar. Draw its associated parse tree. - Verify that
a = a + a ; b = b
is recognized by the grammar for the simple imperative language above. Give a derivation that is not leftmost. Implement the function
type 'a tree = Leaf of 'a | Node of 'a tree list let frontier (t : 'a tree) : 'a list = assert false (* TODO *)
which returns a list of the members of
t
in order from left to right.
2.2. Ambiguity
A BNF specification is ambiguous if there is a sentence with multiple parse trees. Equivalently, a specification is ambiguous if there is a sentence with multiple leftmost derivations. We try to avoid ambiguous specifications for programming languages because we ultimately don't want a program to be interpretable in multiple ways.
2.2.1. Fixity
The fixity of an operator refers to where the operator is written with respect to its arguments.
- prefix operators appear before their argument
- the negation operator:
-5
- the negation operator:
- postfix operators appear after their argument
- type constructors:
int list
- type constructors:
- infix (binary) operators appear between their arguments
- arithmetic operators:
(1 + 2) * (3 + 4)
)
- arithmetic operators:
- mixfix operators are a combination of these
- if-else-expressions:
if not b the f x else g x
- if-else-expressions:
If a language's syntactic constructs are all prefix (Polish notation) or all postfix (reverse Polish notation) then the specification is unambiguous. We can make infix binary operators unambiguous by specifying their associativity and precedence.
2.2.2. Associativity
An operator \(\square\) is declared left associative if we interpret \(a \square b \square c\) as \((a \square b) \square c\).
- For arithmetic expressions, we take subtraction to be
left-associative, so the expression
1 - 2 - 3
evaluates to-4
as opposed to2
.
An operator \(\square\) is declared right associative if we interpret \(a \square b \square c\) as \(a \square (b \square c)\).
- For arithmetic expressions, we take exponentiation to be
right-associative, so the expression
2 ^ 1 ^ 3
evaluates to2
as opposed to8
.
The associativity of an operator should affect the shape of parse tree in a given grammar. For example, in the grammar:
<expr> ::= x | <expr> + <expr>
there are two parse trees for the sentence x + x + x
:
<expr> <expr> | | |---------------------------\ |------------ \ | | | | | | <expr> | | | | <expr> | | | | | | |-------------\ | | | | |-------------\ | | | | | | | | | | <expr> | <expr> | <expr> <expr> | <expr> | <expr> | | | | | | | | | | x + x + x x + x + x
The left tree groups the first two arguments (left associatively) and the second groups the last two arguments (right associatively).
We can enforce the associativity of an operator in the specification itself by breaking symmetry in the production rules, effectively choosing one of the above two parse trees. Addition is typically understood as left associative, so we should require its right argument to be a variable:
<expr> ::= x | <expr> + x
2.2.3. Precedence
Given two binary operators \(\square\) and \(\triangle\), the operator \(\square\) has higher precedence than \(\triangle\) if we interpret \(a \square b \triangle c\) as \((a \square b) \triangle c\) and \(a \triangle b \square c\) as \(a \triangle (b \square c)\).
- For arithmetic expressions, we take multiplication to have higher
precedence than addition, so the expression
2 * 2 + 3
evaluates to7
as opposed to10
.
As with associativity, we can enforce precedence within the specification itself. In the grammar:
<expr> ::= x | <expr> + x | <expr> * x
the sentences x + x * x
has only one parse tree, but is not the
correct parse tree with respect to the rules of arithmetic (addition
has been given higher precedence).
<expr> | |---------------------------\ | | | <expr> | | | | | |-------------\ | | | | | | | <expr> | | | | | | | | | x + x * x
To fix this, we can break symmetry again by ensuring that arguments to multiplication can only be variables or other expressions with only multiplication.
<expr> ::= <expr2> | <expr> + <expr2> <expr2> ::= x | <expr2> * x
In this grammar, the parse tree of x + x * x
groups the arguments of
multiplication (giving multiplication higher precedence).
<expr> | |------------- \ | | | <expr> | <expr2> | | | | | |--------------\ | | | | | <expr2> | <expr2> | | | | | | | x + x * x
Note that it is now impossible to group the argument of the addition. In order to make this possible, parentheses need to be introduced into the grammar:
<expr> ::= <expr2> | <expr> + <expr2> <expr2> ::= x | <expr2> * x | ( <expr> )
2.2.4. Exercises
- Draw the parse tree for the sentence
(x + x) * x
in the grammar at the end of the section. - Draw the parse tree for the sentence
x * x + x * x
in the grammar at the end of the section. - Update the grammar at the end of the section to include subtraction, multiplication, and (right-associative) exponentiation.
Consider the following ambiguous grammar.
<s> ::= <np> <vp> <vp> ::= <v> | <v> <np> | <v> <np> <pp> <pp> ::= <p> <np> <np> ::= <n> | <d> <n> | <np> <pp> <n> ::= John | man | mountain | telescope <v> ::= saw <d> ::= the <p> ::= on | with
Give two leftmost derivations of the sentence
John saw the man on the mountain with the telescope
.Consider the following grammar for function types over
int
.<fun-type> ::= <int-type> | <fun-type> -> <fun-type> <int-type> ::= int
Show that this grammar is ambiguous and then rewrite this grammar so that
->
is a right associative operator.
2.3. Regular Grammars
A (right linear) regular grammar is a grammar such that every rule is in one of the following forms:
<nonterm> ::= term <nonterm> ::= term <nonterm> <nonterm> ::= ϵ
where \(\epsilon\) refers to the empty sentence.
Regular expressions are a compact syntax for regular grammars.6 They are defined inductively as follows:
- a terminal symbol is a regular expression
- \(\epsilon\) is a regular expression
- if \(r_1, r_2, \dots, r_k\) are regular expressions then so are \(r_1r_2 \dots r_k\) and \(r_1 \ | \ r_2 \ | \ \dots \ | \ r_k\)
- if \(r\) is a regular expression, then so is \((r)\) and \(r^*\)
A regular expression accepts or recognizes a sentence according to the following rules:
- a terminal symbol \(t\) accepts \(t\)
- if each \(r_i\) excepts \(s_i\), then \(r_1r_2\dots r_k\) accepts \(s_1s_2\dots s_k\)
- if \(r_i\) accepts \(s_i\) then, \(r_1 \ | \ r_2 \ | \ \dots \ | \ r_k\) accepts \(s_i\)
- if \(r\) accepts \(s\) the \((r)\) accepts \(s\)
- if \(r\) accepts \(s_1, s_2, \dots, s_k\), then \(r^*\) accepts \(s_1s_2 \dots s_k\)
2.3.1. Exercises
- Find a regular expression for all binary strings in which every
1
is adjacent to exactly one other1
. - Write a right-linear regular grammar for all binary strings in
which every
1
must be followed by at least two0
's. Give a derivation in this grammar of the sentence100100010000
. - Determine which of the following sentences are accepted by the
regular expression
a(bc|cb)*d
.abc
abcd
abccbd
abad
abbcd
2.4. Chomsky Normal Form
A grammar is in Chomsky normal form if its rule are all of the following forms:
<start> ::= ϵ <nonterm> ::= <nonterm> <nonterm> <nonterm> ::= term
Every grammar we have considered (i.e., every grammar with a BNF specification) can be converted into an equivalent grammar (i.e., one which accepts the same sentences) which is in Chomsky normal form.
2.4.1. Exercises
Rewrite the following grammar in Chomsky normal form.
<a> ::= a <b> b <b> ::= <b> c | <c> <c> <c> ::= <c> d | d <c>
3. Parsing
The general parsing problem is to find a derivation of a sentence in a given grammar, if one exists.7
In the context of this course, we are primarily interested in the
specific problem of converting a string
(or char list
) into an ADT
representing the syntax of the program.
There are many ways to accomplish this, we saw two: recursive-descent and parser-combinators.
Recursive-descent parsing is an ad-hoc parsing method in which mutually recursive functions are defined to parse each part of a given specification. It will not appear on the final exam in any significant way.
3.1. Combinators
We can think of a parser for 'a
's as a function of type
type 'a parser = char list -> ('a * char list) option
which
- consumes the prefix of the input stream corresponding to an
'a
, - converts that prefix to an
'a
, and finally, - returns that
'a
and the remainder of the stream, failing if no initial part of the stream corresponds to an'a
.
One of the simplest examples is the char
parser:
let char (d : char) (cs : char list) : (char * char list) option = match cs with | c :: cs when c = d -> Some (d, cs) | _ -> None
which consumes the first character of cs
given that it is equal to
d
and returns it, along with the remainder of cs
. This parser
fails (returns None
) in the case that the first character of cs
is
not d
.
When we want to use a parser, we apply it to a character list and verify that it consumed it's the entire input:
let parse (p : 'a parser) (s : string) = match p (explode s) with | (a, []) -> Some a | _ -> None
A parser combinator is a higher-order function which can be used to compose parsers. There is a small subset of parser combinators which are of particular importance because they correspond to the constructs in EBNF specifications and regular expressions.
Alternatives.
p1 <|> p2
is the parser which tries running the parserp1
, returning its output if it succeeds, and runsp2
otherwise.If
p1
is a parser for the forms of a nonterminal symbol<p1>
andp2
a parser for forms of a nonterminal symbol<p2>
, thenp1 <|> p2
is a parser for forms of the nonterminal symbol<alt> ::= <p1> | <p2>
Sequencing.
seq p1 p2
is the parser which runs bothp1
andp2
and returns both of their outputs if both parsers succeed. It fails if eitherp1
orp2
fails.If
p1
is a parser for the forms of a nonterminal symbol<p1>
andp2
a parser for forms of a nonterminal symbol<p2>
, thenseq p1 p2
is a parser for forms of the nonterminal symbol<seq> ::= <p1> <p2>
Repetition.
many p
is the parser which runsp
repeatedly until it fails, collecting all its outputs in a list.If
p
is a parser for the forms of a nonterminal symbol<p>
, thenmany p
is a parser for forms of the nonterminal symbol<many> ::= { <p> }
The last important combinator is map
, which can be used to
manipulate the output of a parser without affecting how it consumes
its input. If p
is an 'a parser
, and f
is a function of type
'a -> 'b
, then map f p
is a 'b parser
which runs 'a
and then
applies f
to its output (if it succeeds).
You should also be familiar with how to use the more convenient parser combinators throughout the course. For the final exam, you will not be required to memorize their definitions.
str
,token
,ws
- (
>>
), (<<
),seq3
,seq4
- (
>|
),map2
,map3
,map4
optional
pure
,fail
, (we will not test onbind
(>>=
) but it is good to know…)
3.1.1. Exercises
Implement the parser
let parse_bool : bool parser = assert false (* TODO *)
which can consume
"True"
and return the valuetre
or"False"
and return the value false. In particular, it should not consume whitespace before or"True"
or"False"
Implement the parser
let parse_bool_list : bool list parser = assert false (* TODO *)
which can parse a list of booleans in Python syntax, i.e., square brackets, comma separators, white-space agnostic.
Implement a parser
let peak : char parser = assert false (* TODO *)
which returns the first character of its input but does not consume any part of the input. This should only fail if the input is empty.
Implement a parser combinator
let check (p : 'a parser) (pred : 'a -> bool) : 'a parser = assert false (* TODO *)
which runs
p
and then fails ifpred
is false on the output ofp
. Hint. It is easier to implement directly, rather than in terms of other combinators.
4. Formal Semantics
Semantics is the study of the meaning of language. We use concepts from formal semantics to model the meaning of programs.
We discussed two forms of semantics: denotational and operational.
Giving a denotational semantics for a programming language means assigning to each program a mathematical function which has the same input/output behavior as the program.
We focused primarily on operational semantics. Giving an operational semantics for a programming language means describing how a program is evaluated. This typically means defining a reduction relation on programs, which describes how a program is reduced until it reaches a state which cannot be further reduced.
4.1. Derivations
Suppose we have a programming language defined by a BNF specification \(\mathcal P\).
A configuration is a pair consistent of a program \(P\) (a sentence accepted by \(\mathcal P\)) and a state \(S\) which may be manipulated by programs. The nature of \(S\) depends on the intentions of the programming language. The state may be empty, as in the case of functional languages.
Defining a (small-step) operational semantics means defining a reduction relation for configurations:
\begin{equation*} ( \ S \ , \ P \ ) \longrightarrow ( \ S' \ , \ P' \ ) \end{equation*}A reduction relation is defined via reduction rules, which consist of a shape of a reduction together with a collection of premises, which may be other shapes of reductions or trivial premises (also called axioms). The general form of a reduction rules is:
\begin{prooftree} \AxiomC{$P_1$} \AxiomC{$P_2$} \AxiomC{$\dots$} \AxiomC{$P_k$} \RightLabel{(name)} \QuaternaryInfC{$C \longrightarrow C'$} \end{prooftree}Shape here refers to the fact that the configurations in a reduction rule contains meta-variables that describe the kind of reductions that can be derived, or that can be used a premises. For example, the rules for evaluating an arithmetic expression might include
\begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(addLeft)} \UnaryInfC{$e_1 + e_2 \longrightarrow e_1' + e_2$} \end{prooftree}which expresses that if \(e_1\) reduces to \(e_1'\) in a single step then \(e_1 + e_2\) reduces to \(e_1' + e_2\) in a single step no matter the expressions \(e_1\), \(e_1'\) and \(e_2\). This rule can be used to show that
\begin{equation*} (1 + 2) + 3 \longrightarrow 3 + 3 \end{equation*}but also that
\begin{equation*} (1 + (4 - 2)) + 3 \longrightarrow (1 + 2) + 3 \end{equation*}Both reductions invoke that fact that if the left argument can be reduced by a single step, then the sum can be reduced by a single step.
A derivation is, informally, a tree in which each node is a reduction and the children of a node are the premises required to derive that reduction according to a reduction rule. The leaves of a derivation are trivial premises. A derivation of a reduction \(C \longrightarrow C'\) is a derivation whose root is \(C \longrightarrow C'\).
What we have been describing is a single-step reduction relation. Any single-step reduction relation can be extended to a multi-step reduction relation by including the following rules and a multi-step reduction relation symbol '\(\longrightarrow^{\star}\)'.
\begin{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(refl)} \UnaryInfC{$C \longrightarrow^\star C$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$C \longrightarrow^\star C'$} \AxiomC{$C' \longrightarrow C''$} \RightLabel{(trans)} \BinaryInfC{$C \longrightarrow^\star C''$} \end{prooftree} \end{equation*}Formally, evaluation is the process of determining a configuration \(C'\) for a given configuration \(C\) such that \(C \longrightarrow^\star C'\) and \(C'\) cannot be further reduced (i.e., there is no other configuration \(C''\) such that \(C' \longrightarrow C''\)).
We can express an evaluation as a derivation of a multi-step reduction, but it is common (and done below) to simply write a sequence of configurations, with the understanding that each configuration follows from the previous configuration by a derivable single-step reduction.
4.2. Examples
There are many examples of operational semantics and reduction rules in this document, we start with the two most basic.
4.2.1. Arithmetic Expressions
In the operational semantics of arithmetic expressions, we take a configuration to be an arithmetic expression (a program with no state). Each rule describes how to reduce an arithmetic expression by a single step.
Grammar for a subset of arithmetic expressions in Polish notation:
<expr> ::= <num> | (add | sub) <expr> <expr> <num> ::= [-] (0 | ... | 9) | { 0 | ... | 9 }
Operational semantics for a subset of arithmetic expression in Polish notation:
\begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(addNum)} \BinaryInfC{$\mathsf{add} \ m \ n \longrightarrow m + n$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(addLeft)} \UnaryInfC{$\textsf{add} \ e_1 \ e_2 \longrightarrow \mathsf{add} \ e_1' \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(addRight)} \UnaryInfC{$\mathsf{add} \ e_1 \ e_2 \longrightarrow \mathsf{add} \ e_1 \ e_2'$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(subNum)} \BinaryInfC{$\mathsf{sub} \ m \ n \longrightarrow m - n$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(subLeft)} \UnaryInfC{$\textsf{sub} \ e_1 \ e_2 \longrightarrow \mathsf{sub} \ e_1' \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(subRight)} \UnaryInfC{$\mathsf{sub} \ e_1 \ e_2 \longrightarrow \mathsf{sub} \ e_1 \ e_2'$} \end{prooftree} \end{equation*}It is generally preferable that any derivable reduction has a unique derivation. This makes defining an evaluation procedure easier, and amounts to fixing an evaluation order. We can enforce an evaluation order via the structure of our reduction rules.
Operational semantics for a subset of arithmetic expression in Polish Notation with a left-to-right evaluation order:
\begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(addNum)} \BinaryInfC{$\mathsf{add} \ m \ n \longrightarrow m + n$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(addLeft)} \UnaryInfC{$\textsf{add} \ e_1 \ e_2 \longrightarrow \mathsf{add} \ e_1' \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(addRight)} \BinaryInfC{$\mathsf{add} \ m \ e_2 \longrightarrow \mathsf{add} \ m \ e_2'$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(subNum)} \BinaryInfC{$\mathsf{sub} \ m \ n \longrightarrow m - n$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(subLeft)} \UnaryInfC{$\textsf{sub} \ e_1 \ e_2 \longrightarrow \mathsf{sub} \ e_1' \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(subRight)} \BinaryInfC{$\mathsf{sub} \ m \ e_2 \longrightarrow \mathsf{sub} \ m \ e_2'$} \end{prooftree} \end{equation*}Example single-step derivation:
\begin{equation}\label{d1} \begin{prooftree} \AxiomC{$2 \in \mathbb Z$} \AxiomC{$3 \in \mathbb Z$} \BinaryInfC{$\mathsf{sub} \ 2 \ 3 \longrightarrow \text{-} 1$} \UnaryInfC{$\mathsf{add} \ \mathsf{sub} \ 2 \ 3 \ \mathsf{add} \ 1 \ 1 \longrightarrow \mathsf{add} \ \text{-}1 \ \mathsf{add} \ 1 \ 1$} \end{prooftree} \end{equation}Example multi-step derivation:
\begin{equation}\label{d2} \begin{prooftree} \AxiomC{$\text - 1 \in \mathbb Z$} \AxiomC{$1 \in \mathbb Z$} \AxiomC{$1 \in \mathbb Z$} \BinaryInfC{$\mathsf{add} \ 1 \ 1 \longrightarrow 2$} \BinaryInfC{$\mathsf{add} \ \text - 1 \ \mathsf{add} \ 1 \ 1 \longrightarrow \mathsf{add} \ \text- 1 \ 2$} \end{prooftree} \end{equation} \begin{equation}\label{d3} \begin{prooftree} \AxiomC{$\text- 1 \in \mathbb Z$} \AxiomC{$2 \in \mathbb Z$} \BinaryInfC{$\mathsf{add} \ \text- 1 \ 2 \longrightarrow 1$} \end{prooftree} \end{equation} \begin{prooftree} \AxiomC{} \UnaryInfC{$\mathsf{add} \ \mathsf{sub} \ 2 \ 3 \ \mathsf{add} \ 1 \ 1 \longrightarrow^\star \mathsf{add} \ \mathsf{sub} \ 2 \ 3 \ \mathsf{add} \ 1 \ 1$} \AxiomC{(\ref{d1})} \BinaryInfC{$\mathsf{add} \ \mathsf{sub} \ 2 \ 3 \ \mathsf{add} \ 1 \ 1 \longrightarrow^\star \mathsf{add} \ \text{-}1 \ \mathsf{add} \ 1 \ 1$} \AxiomC{(\ref{d2})} \BinaryInfC{$\mathsf{add} \ \mathsf{sub} \ 2 \ 3 \ \mathsf{add} \ 1 \ 1 \longrightarrow^\star \mathsf{add} \ \text{-}1 \ 2$}\ \AxiomC{(\ref{d3})} \BinaryInfC{$\mathsf{add} \ \mathsf{sub} \ 2 \ 3 \ \mathsf{add} \ 1 \ 1 \longrightarrow^\star 1$}\ \end{prooftree}Example evaluation, written as a sequence of configurations:
add sub 2 3 add 1 1 ⟶ add -1 add 1 1 ⟶ add -1 2 ⟶ 1 ✓
4.2.2. A Stack-Oriented Language
The primary examples we used in this course for understanding operational semantics were variants of stack-oriented languages.8
In the simplest case, we take a configuration to be a program (\(P\)) together with a stack of integers (\(S\)), written
\begin{equation*} ( \ S \ , \ P \ ) \end{equation*}We also include a special irreducible configuration \(\mathsf{ERROR}\) for representing a failed evaluation.
Grammar for a simple stack-oriented language:
<prog> ::= { <com> } <com> ::= push <num> | dup | add | sub | then <prog> else <prog> end <num> ::= (0 | ... | 9) { 0 | ... | 9 }
Example program which puts 14 + 15 - 16 on the stack:
push 16 push 15 push 14 add sub
Operational semantics for a simple stack-oriented language:
\begin{prooftree} \AxiomC{} \RightLabel{(push)} \UnaryInfC{$(\ S \ , \ \textsf{push n} \ P \ ) \longrightarrow ( \ n :: S \ ,\ P \ )$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(dup)} \UnaryInfC{$( \ n :: S \ , \ \textsf{dup} \ P \ ) \longrightarrow ( \ n :: n :: S \ , \ P \ )$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{} \RightLabel{(dupErr)} \UnaryInfC{$( \ \varnothing \ , \ \textsf{dup} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(add)} \UnaryInfC{$( \ m :: n :: S \ , \ \textsf{add} \ P \ ) \longrightarrow ( \ (m + n) :: S \ , \ P \ )$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(addErr1)} \UnaryInfC{$( \ n :: S \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{} \RightLabel{(addErr0)} \UnaryInfC{$( \ \varnothing \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(sub)} \UnaryInfC{$( \ m :: n :: S \ , \ \textsf{sub} \ P \ ) \longrightarrow ( \ (m - n) :: S \ , \ P \ )$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(subErr1)} \UnaryInfC{$( \ n :: S \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{} \RightLabel{(subErr0)} \UnaryInfC{$( \ \varnothing \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(ifFalse)} \UnaryInfC{$( \ 0 :: S \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ Q_2 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$n \not = 0$} \RightLabel{(ifTrue)} \UnaryInfC{$( \ n :: S \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ Q_1 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifErr)} \UnaryInfC{$( \ \varnothing \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree}Example multi-step derivation:
\begin{equation}\label{e1} \begin{prooftree} \AxiomC{} \UnaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow ( \ 16 :: \varnothing \ , \ \textsf{push 15 push 14 add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{e2} \begin{prooftree} \AxiomC{} \UnaryInfC{$ ( \ 16 :: \varnothing \ , \ \textsf{push 15 push 14 add sub} \ ) \longrightarrow ( \ 15 :: 16 :: \varnothing \ , \ \textsf{push 14 add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{e3} \begin{prooftree} \AxiomC{} \UnaryInfC{$ ( \ 15 :: 16 :: \varnothing \ , \ \textsf{push 14 add sub} \ ) \longrightarrow ( \ 14 :: 15 :: 16 :: \varnothing \ , \ \textsf{add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{e4} \begin{prooftree} \AxiomC{} \UnaryInfC{$ ( \ 14 :: 15 :: 16 :: \varnothing \ , \ \textsf{add sub} \ ) \longrightarrow ( \ 29 :: 16 :: \varnothing \ , \ \textsf{sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{e5} \begin{prooftree} \AxiomC{} \UnaryInfC{$ ( \ 29 :: 16 :: \varnothing \ , \ \textsf{sub} \ ) \longrightarrow ( \ 13 :: \varnothing \ , \ \epsilon \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{f1} \begin{prooftree} \AxiomC{} \UnaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow^\star ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{f2} \begin{prooftree} \AxiomC{(\ref{f1})} \AxiomC{(\ref{e1})} \BinaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow^\star ( \ 16 :: \varnothing \ , \ \textsf{push 15 push 14 add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{f3} \begin{prooftree} \AxiomC{(\ref{f2})} \AxiomC{(\ref{e2})} \BinaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow^\star ( \ 15 :: 16 :: \varnothing \ , \ \textsf{push 14 add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{f4} \begin{prooftree} \AxiomC{(\ref{f3})} \AxiomC{(\ref{e3})} \BinaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow^\star ( \ 14 :: 15 :: 16 :: \varnothing \ , \ \textsf{add sub} \ ) $} \end{prooftree} \end{equation} \begin{equation}\label{f5} \begin{prooftree} \AxiomC{(\ref{f4})} \AxiomC{(\ref{e4})} \BinaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow^\star ( \ 29 :: 16 :: \varnothing \ , \ \textsf{sub} \ ) $} \end{prooftree} \end{equation} \begin{prooftree} \AxiomC{(\ref{f5})} \AxiomC{(\ref{e5})} \BinaryInfC{$ ( \ \varnothing \ , \ \textsf{push 16 push 15 push 14 add sub} \ ) \longrightarrow^\star ( \ 13 :: \varnothing \ , \ \epsilon \ ) $} \end{prooftree}Example evaluation:
( ∅ , push 16 push 15 push 14 add sub ) ⟶ ( 16 :: ∅ , push 15 push 14 add sub ) ⟶ ( 15 :: 16 :: ∅ , push 14 add sub ) ⟶ ( 14 :: 15 :: 16 :: ∅ , add sub ) ⟶ ( 29 :: 16 :: ∅ , sub ) ⟶ ( 13 :: ∅ , ϵ ) ✓
4.2.3. Exercises
- Add multiplication (
mul
) and division (div
) to the operational semantics of arithmetic expressions (make sure to disallow division by \(0\)). Derive the single-step reduction
sub add add 1 2 mul 3 4 2 ⟶ sub add 3 mul 3 4 2
in the operational semantics for arithmetic expressions.
Derive the multi-step reduction
sub add add 1 2 mul 3 4 2 ⟶⋆ ??
where
??
is the value of the given expression according to the given semantics. You may write it as a sequence of configurations.- Write an operational semantics for Boolean expressions with left-to-right evaluation and short-circuiting (like in project 3).
Given an evaluation of the program
push 10 dup mul dup
in the given semantics for the above stack-oriented language.
5. Variables
When introducing variables into a programming language, we need to introduce an environment to the configuration which maintains variable bindings.
The structure of the environment depends on the scoping rules used in the language, but in the simplest case, and environment is a mapping of variable names to values. This can be implemented as an association list in OCaml.
No matter the implementation, an environment requires two operations: fetch and update. The fetch operation should get the value to which a variable is bound in the environment, failing if the variable is not bound in the environment. The update operation should return a new environment with the additional binding.
OCaml implementation of an environment represented as an association list:
type value = unit (* A DUMMY TYPE *) type 'a env = ('a * value) list let empty_env = [] let fetch (e : 'a env) (x : 'a) : value option = List.assoc_opt x e let update (e : 'a env) (x : 'a) (v : value) : 'a env = (x, v) :: e
Again, different environment representations need different implementations of fetch and update, but regardless of how these functions are implemented, they should satisfy the following equations:
fetch emtpy_env x = None
for anyx
fetch (update e x v) x = Some v
for anye
,x
, andv
fetch (update e x v) y = fetch e y
for anye
,x
,y
, andv
giveny
is not equal tox
5.1. Stack-Oriented Language with Variables
In the operational semantics of the following stack-oriented language, we take a configuration to be a program (\(P\)) together with a stack of integers (\(S\)) and an environment \(E\) of variable bindings.
\begin{equation*} ( \ S \ , \ E \ , \ P \ ) \end{equation*}As above, we also include the special irreducible configuration \(\mathsf{ERROR}\).
Grammar for a stack-oriented language with variables:
<prog> ::= { <com> } <com> ::= push <num> | dup | add | sub | then <prog> else <prog> end | lookup <ident> | assign <ident> <num> ::= ( 0 | ... | 9 ) { 0 | ... | 9 } <ident> ::= ( A | ... | Z ) { A | ... | Z }
Example program:
push 14 assign X push 15 assign Y lookup X lookup Y add
Operational semantics for a stack-oriented language with variables:
\begin{prooftree} \AxiomC{} \RightLabel{(push)} \UnaryInfC{$(\ S \ , \ E \ , \ \textsf{push n} \ P \ ) \longrightarrow ( \ n :: S \ , \ E \ ,\ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dup)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow ( \ n :: n :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dupErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(add)} \UnaryInfC{$( \ m :: n :: S \ , \ E \ , \ \textsf{add} \ P \ ) \longrightarrow ( \ (m + n) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(addErr1)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{} \RightLabel{(addErr0)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(sub)} \UnaryInfC{$( \ m :: n :: S \ , \ E \ , \ \textsf{sub} \ P \ ) \longrightarrow ( \ (m - n) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(subErr1)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{} \RightLabel{(subErr0)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{} \RightLabel{(ifFalse)} \UnaryInfC{$( \ 0 :: S \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q_2 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$n \not = 0$} \RightLabel{(ifTrue)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q_1 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x) \in \mathbb Z$} \RightLabel{(lookup)} \UnaryInfC{$( \ S \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow ( \ \mathsf{fetch}(E, x) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x) \not \in \mathbb Z$} \RightLabel{(lookupErr)} \UnaryInfC{$( \ S \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assign)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow ( \ S \ , \ \mathsf{update}(E, x, n) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assignErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree}Example evaluation:
( ∅ , [] , push 14 assign X push 15 assign Y lookup X lookup Y add ) ⟶ ( 14 :: ∅ , [] , assign X push 15 assign Y lookup X lookup Y add ) ⟶ ( ∅ , [X ↦ 14] , push 15 assign Y lookup X lookup Y add ) ⟶ ( 15 :: ∅ , [X ↦ 14] , assign Y lookup X lookup Y add ) ⟶ ( ∅ , [Y ↦ 15; X ↦ 14] , lookup X lookup Y add ) ⟶ ( 14 :: ∅ , [Y ↦ 15; X ↦ 14] , lookup Y add ) ⟶ ( 15 :: 14 :: ∅ , [Y ↦ 15; X ↦ 14] , add ) ⟶ ( 29 :: ∅ , [Y ↦ 15; X ↦ 14] , ϵ ) ✓
The other primary concern when it comes to variables is scoping. This will come in a later section after discussing subroutines.
5.1.1. Exercises
There is currently no way to unassign variables in the environment. Redefine the function
let update (e : env) (x : string) (v : value option) : env = assert false (* TODO *)
which takes an optional value so that
update e x (Some v)
will bindx
tov
ine
andupdate e x None
will remove any bindings ofx
in the environment.- Extend the command set and the operational semantics to include a
command
unassign X
which can be used to remove bindings in the environment within a program.
6. Subroutines
A subroutine is an encapsulated piece of code that can be reused and executed in different contexts within a program.
Implementing subroutines in a programming language requires dealing with several concerns, including parameter passing and return values.
6.1. Stack-Oriented Language with Subroutines
For stack-oriented languages, we don't need to deal with parameters or return values of subroutines; all of these is handled on the stack. This makes adding subroutines to our stack-oriented languages fairly simple.
In the operational semantics of the following stack-oriented language, we take a configuration to be a program (\(P\)) together with a stack of integers (\(S\)) and an environment (\(E\)) of bindings which now includes bindings to programs, or the special \(\mathsf{ERROR}\) configuration.
Grammar for a stack-oriented language with subroutines:
<prog> ::= { <com> } <com> ::= push <num> | dup | add | sub | then <prog> else <prog> end | lookup <ident> | assign <ident> | def <ident> begin <prog> end | call <ident> <num> ::= ( 0 | ... | 9 ) { 0 | ... | 9 } <ident> ::= ( A | ... | Z ) { A | ... | Z }
Example program:
def F begin dup add end push 6 assign X push 5 call F lookup X call F
Operational Semantics for a stack-oriented language with subroutines (and dynamic scoping):
\begin{prooftree} \AxiomC{} \RightLabel{(push)} \UnaryInfC{$(\ S \ , \ E \ , \ \textsf{push n} \ P \ ) \longrightarrow ( \ n :: S \ , \ E \ ,\ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dup)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow ( \ n :: n :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dupErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(add)} \UnaryInfC{$( \ m :: n :: S \ , \ E \ , \ \textsf{add} \ P \ ) \longrightarrow ( \ (m + n) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(addErr1)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(addErr0)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(sub)} \UnaryInfC{$( \ m :: n :: S \ , \ E \ , \ \textsf{sub} \ P \ ) \longrightarrow ( \ (m - n) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(subErr1)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(subErr0)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifFalse)} \UnaryInfC{$( \ 0 :: S \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q_2 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$n \not = 0$} \RightLabel{(ifTrue)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q_1 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x) \in \mathbb Z$} \RightLabel{(lookup)} \UnaryInfC{$( \ S \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow ( \ \mathsf{fetch}(E, x) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x) \not \in \mathbb Z$} \RightLabel{(lookupErr)} \UnaryInfC{$( \ S \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assign)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow ( \ S \ , \ \mathsf{update}(E, x, n) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assignErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(funDef)} \UnaryInfC{$( \ S \ , \ E \ , \ \mathsf{def} \ F \ \mathsf{begin} \ Q \ \mathsf{end} \ P \ ) \longrightarrow ( \ S \ , \ \mathsf{update}(E, F, Q) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, F) = Q$} \RightLabel{(call)} \UnaryInfC{$( \ S \ , \ E \ , \ \mathsf{call} \ F \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, F) = \bot \quad$ or $\quad \mathsf{fetch}(E, F) \in \mathbb Z$} \RightLabel{(callErr)} \UnaryInfC{$( \ S \ , \ E \ , \ \mathsf{call} \ F \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree}Example execution:
( ∅ , [] , def F begin dup add end push 6 assign X push 5 call F lookup X call F ) ⟶ ( ∅ , [F ↦ dup add] , push 6 assign X push 5 call F lookup X call F) ⟶ ( 6 :: ∅ , [F ↦ dup add] , assign X push 5 call F lookup X callF) ⟶ ( ∅ , [X ↦ 6, F ↦ dup add] , push 5 call F lookup X call F ) ⟶ ( 5 :: ∅ , [X ↦ 6, F ↦ dup add] , call F lookup X call F ) ⟶ ( 5 :: ∅ , [X ↦ 6, F ↦ dup add] , dup add lookup X call F ) ⟶ ( 5 :: 5 :: ∅ , [X ↦ 6, F ↦ dup add] , add lookup X call F ) ⟶ ( 10 :: ∅ , [X ↦ 6, F ↦ dup add] , lookup X call F ) ⟶ ( 6 :: 10 :: ∅ , [X ↦ 6, F ↦ dup add] , call F ) ⟶ ( 6 :: 10 :: ∅ , [X ↦ 6, F ↦ dup add] , dup add ) ⟶ ( 6 :: 6 :: 10 :: ∅ , [X ↦ 6, F ↦ dup add] , add ) ⟶ ( 12 :: 10 :: ∅ , [X ↦ 6, F ↦ dup add] , ϵ ) ✓
6.1.1. Exercises
- Implement the function
SWAP
in the stack-oriented language with subroutines which swaps the top two values on the stack, assuming there are two values on the top of the stack. Also implement the functionROT
which rotates to the top three elements of the stack, i.e.,x :: y :: z :: S
becomesz :: x :: y :: S
. - Implement
MUL
which multiplies the top two elements of the stack.
6.2. Parameter Passing
Parameter passing is part of the evaluation strategy of a programming language. It refers to how parameters are passed to a function when it is called. There are two forms of parameter passing we primarily considered in the course:
- call-by-value. When a function is called, its arguments are evaluated and the function is given the values of its arguments. The important feature of the call-by-value strategy is that the arguments of a function are evaluated exactly once.
- call-by-name. when a function is called, it is given the names of its arguments, to which a function can refer. The important feature of the call-by-name strategy is that, arguments are only evaluated if they are used; if a function never uses an argument, then that argument is never evaluated.
The details of the call-by-name and call-by-value strategies are more subtle in the context of imperative languages with mutable variable bindings. These strategies are more easily understood in the context of functional languages, as in the following examples.
Note that both of these example use the substitution model for their operational semantics.
6.2.1. A Functional Language with Call-by-Name Parameter Passing
Grammar for a simple functional language:
<expr> ::= <value> | <expr> <expr> | <expr> (+ | -) <expr> | '(' <expr> ')' <value> ::= <num> | <ident> | fun <ident> -> <expr> <num> ::= ( 0 | ... | 9 ) { 0 | ... | 9 } <ident> ::= ( a | ... | z ) { a | ... | z }
Example program:
(fun x -> fun y -> x) 2 ((fun z -> z + 5) 4)
Operational semantics for the lambda calculus with call-by-name parameter passing and left-to-right evaluation:
\begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(appRed)} \UnaryInfC{$e_1 \ e_2 \longrightarrow e_1' \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{} \RightLabel{(cbnBeta)} \UnaryInfC{$(\texttt{fun} \ x \ \texttt{->} \ e_1) \ e_2 \longrightarrow e_1[e_2 / x]$} \end{prooftree} \end{equation*} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(addRedLeft)} \UnaryInfC{$e_1 \ \texttt{+} \ e_2 \longrightarrow e_1' \ \texttt{+} \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$v \in \mathbb Z$} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(addRedRight)} \BinaryInfC{$v \ \texttt{+} \ e_2 \longrightarrow v \ \texttt{+} \ e_2'$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(addNum)} \BinaryInfC{$m \ \texttt{+} \ n \longrightarrow m + n$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(subRedLeft)} \UnaryInfC{$e_1 \ \texttt{-} \ e_2 \longrightarrow e_1' \ \texttt{-} \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$v \in \mathbb Z$} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(subRedRight)} \BinaryInfC{$v \ \texttt{-} \ e_2 \longrightarrow v \ \texttt{-} \ e_2'$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(subNum)} \BinaryInfC{$m \ \texttt{-} \ n \longrightarrow m - n$} \end{prooftree}Example evaluation: Note that the "second" argument is never evaluated.
(fun x -> fun y -> x) 2 ((fun z -> z + 5) 4) ⟶ (fun y -> x) [2 / x] ((fun z -> z + 5) 4) ≡ (fun y -> 2) ((fun z -> z + 5) 4) ⟶ 2 [(fun z -> z + 5) 4 / y] ≡ 2
6.2.2. Functional language with Call-by-Value Parameter Passing
Operational semantics for a simple functional language with call-by-value parameter passing and left-to-right evaluation:
\begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(appRedLeft)} \UnaryInfC{$e_1 \ e_2 \longrightarrow e_1' \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(appRedRight)} \UnaryInfC{$(\texttt{fun} \ x \ \texttt{->} e_1) \ e_2 \longrightarrow (\texttt{fun} \ x \ \texttt{->} e_1) \ e_2'$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{$v$ is a value} \RightLabel{(cbvBeta)} \UnaryInfC{$(\texttt{fun} \ x \ \texttt{->} \ e) \ v \longrightarrow e[v / x]$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(addRedLeft)} \UnaryInfC{$e_1 \ \texttt{+} \ e_2 \longrightarrow e_1' \ \texttt{+} \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$v \in \mathbb Z$} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(addRedRight)} \BinaryInfC{$v \ \texttt{+} \ e_2 \longrightarrow v \ \texttt{+} \ e_2'$} \end{prooftree} \end{equation*}p
\begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(addNum)} \BinaryInfC{$m \ \texttt{+} \ n \longrightarrow m + n$} \end{prooftree} \begin{equation*} \begin{prooftree} \AxiomC{$e_1 \longrightarrow e_1'$} \RightLabel{(subRedLeft)} \UnaryInfC{$e_1 \ \texttt{-} \ e_2 \longrightarrow e_1' \ \texttt{-} \ e_2$} \end{prooftree} \qquad \begin{prooftree} \AxiomC{$v \in \mathbb Z$} \AxiomC{$e_2 \longrightarrow e_2'$} \RightLabel{(subRedRight)} \BinaryInfC{$v \ \texttt{-} \ e_2 \longrightarrow v \ \texttt{-} \ e_2'$} \end{prooftree} \end{equation*} \begin{prooftree} \AxiomC{$m \in \mathbb Z$} \AxiomC{$n \in \mathbb Z$} \RightLabel{(subNum)} \BinaryInfC{$m \ \texttt{-} \ n \longrightarrow m - n$} \end{prooftree}Example evaluation: Note that the "second" argument is evaluated before it passed to the function (i.e., before the function is applied to it).
(fun x -> fun y -> x) 2 ((fun z -> z + 5) 4) ⟶ (fun y -> x) [2 / x] ((fun z -> z + 5) 4) ≡ (fun y -> 2) ((fun z -> z + 5) 4) ⟶ (fun y -> 2) ((z + 5) [4 / z]) ≡ (fun y -> 2) (4 + 5) ⟶ (fun y -> 2) 9 ⟶ 2 [ 9 / y ] ≡ 2
6.2.3. Exercises
- Give an example of a program which does not terminate using call-by-value parameter passing, but does terminate using call-by-name parameter passing.
- Give an example of a program which will take ~4 times as many reductions to terminate using call-by-name parameter passing, as compared to call-by-value parameter passing.
- Write a program that does not terminate using call-by-name parameter passing.
Determine what the following program evaluates to.
(fun f -> fun x -> f x x) (fun y -> fun z -> fun a -> y (z a)) p (fun x -> x + 1) 3
We can represents expressions in the language as an ADT. Implement following function
type expr = App of expr * expr | Add of expr * expr | Sub of expr * expr | Num of int | Var of string | Fun of string * expr let is_value (e : expr) : bool = assert false (* TODO *)
which returns
true
ife
is a value (i.e., it is a number, a variable, or a function), andfalse
otherwise.You may notice that our language has no error configuration. We say that an expression is stuck if it is not a value but it cannot be further reduced. Implement the function
let is_stuck (e : expr) : bool = assert false (* TODO *)
which returns
true
ife
is stuck andfalse
otherwise.
7. Dynamic Scoping
Dynamic scoping refers to the use of computational (temporal) context to determine when a binding is available. In its simplest form, we may think of all bindings as globally available as soon as they have been instantiated in the process of executing a program.
In a language with dynamic scoping, when it comes to determining what bindings are available to a subroutine, it doesn't matter where a subroutine is defined, but rather where it is called.
Bash is a widely used language with dynamic scoping, but in general, dynamic scoping is not common in modern programming languages. It is, however, much easier to implement than lexical scoping.
The stack-oriented language with subroutines given above implements dynamic scoping by fiat.
7.1. Examples
Example bash program highlighting dynamic scoping:
function f () { X=2 } function g () { echo $X } X=1 g f g # prints: # 1 # 2
Example program in the stack-oriented language highlighting dynamic scoping:
def F begin push 2 assign X end def G begin lookup X end push 1 assign X call G call F call G
Example evaluation:
( ∅ , [] , def F begin push 2 assign x end def G begin lookup X end push 1 assign X call G call F call G) ⟶ ( ∅ , [F ↦ push 2 assign X] , def G begin lookup X push 1 assign X call G call F call G) ⟶ ( ∅ , [G ↦ lookup X; F ↦ push 2 assign X] , push 1 assign X call G call F call G) ⟶ ( 1 :: ∅ , [G ↦ lookup X; F ↦ push 2 assign X] , assign X call G call F call G) ⟶ ( ∅ , [X ↦ 1; G ↦ lookup X; F ↦ push 2 assign X] , call G call F call G) ⟶ ( ∅ , [X ↦ 1; G ↦ lookup X; F ↦ push 2 assign X] , lookup X call F call G) ⟶ ( 1 :: ∅ , [X ↦ 1; G ↦ lookup X; F ↦ push 2 assign X] , call F call G) ⟶ ( 1 :: ∅ , [X ↦ 1; G ↦ lookup X; F ↦ push 2 assign X] , push 2 assign X call G) ⟶ ( 2 :: 1 :: ∅ , [X ↦ 1; G ↦ lookup X; F ↦ push 2 assign X] , assign X call G) ⟶ ( 1 :: ∅ , [X ↦ 2; G ↦ lookup X; F ↦ push 2 assign X] , call G) ⟶ ( 1 :: ∅ , [X ↦ 2; G ↦ lookup X; F ↦ push 2 assign X] , lookup X) ⟶ ( 2 :: 1 :: ∅ , [X ↦ 2; G ↦ lookup X; F ↦ push 2 assign X] , ϵ ) ✓
7.1.1. Exercises
Give an evaluation of the following program.
def F begin deg G begin lookup X end end push 1 assign X call F call G
8. Lexical Scoping
Lexical scoping refers to the use of properties of the source code text to deliniate the availability of a binding. In Python, the body of a function, determined by indentation, deliniates function scope. In OCaml, The body of a let-expression is exactly the scope of its associated binding (it is not possible to refer to that binding outside of the body of the let-expression). Lexical scoping is common among modern programming languages (including both Python and OCaml).
In a language with lexical scoping, when it comes to determining what bindings are available to a subroutine, it doesn't matter where a subroutine was called, but rather where it was defined (this is the opposite of dynamic scoping).
The implementation lexical scoping depends on a couple considerations:
- Are variables mutable?
- Are functions higher-order?
8.1. Activation Records
Lexical scoping with mutable variables requires maintaining a call stack. The call stack keeps track of the function calls made during the evaluating a program, along with local variables to function calls.
Each function call (transferring control from the caller to the callee) creates an activation record (or frame) which keeps track of:
- the input parameters of the function
- a pointer to the activation record in which the function was defined
- a pointer to where to return after completing the function call
Returning from a subroutine (transferring control from the callee to the caller) removes its associated frame from the stack.
At a low level, we can think of an activation record as having the following rough structure.
TOP OF THE STACK +--------------+ | localK | | ... | | local2 | | local1 | | return_addr | | def_addr | | argN | | ... | | arg2 | | arg1 | +--------------+ ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
Conceptually, we represent the call stack within the environment. That is, rather than maintaining a collection of bindings in the environment, we maintain a stack of activation records. In OCaml, we might implement this as follows.9 Note that this implementation has our stack-oriented languages in mind so there are no parameters in a record.
IMPORTANT. This works slightly differently than in project 2. Note
that the fetch
operation returns the identifier of the record in
which it found a binding.
type command = unit (* DUMMY TYPE *) type program = command list type value = Num of int | Prog of program type record = { id : int ; locals : (string * value) list ; ret_prog : program ; def_pointer : int } type env = record list let empty_env = [ { id = 0 ; locals = [] ; def_pointer = -1 ; ret_prog = [] } ] let fetch (e : env) (x : string) : (value * int) option = let rec go e x i = match e with | r :: e -> if r.id = i then match List.assoc_opt x r.locals with | Some v -> Some (v, i) | None -> go e x r.def_pointer else go e x i | _ -> assert false (* impossible *) in match e with | r :: _ -> go e x r.id | _ -> assert false (* impossible *) let update (e : env) (x : string) (v : value) : env = match e with | r :: e -> { r with locals = (x, v) :: r.locals } :: e | [] -> assert false (* impossible *)
8.1.1. Stack-Oriented Language with Mutable Variables and Lexical Scoping
Lexical scoping becomes more subtle to implement if functions are higher-order. In this example, we do not allow higher-order functions, but we do allow nested function definitions.
In the operational semantics for the following stack-oriented language, we take a configuration to be a program (\(P\)) together with a stack of integers (\(S\)) and an call stack (\(E\)) which is a stack of records as describe above. We use
\begin{equation*} \langle i , L, R, j \rangle \end{equation*}to denote an activation record with identifier \(i\), local bindings \(L\), return program \(R\), and pointer \(j\) to the defining activation record.
This is very similar to what was done in project 2.
Operational Semantics for a stack-oriented language with lexical scoping and mutable variables:
\begin{prooftree} \AxiomC{} \RightLabel{(push)} \UnaryInfC{$(\ S \ , \ E \ , \ \textsf{push n} \ P \ ) \longrightarrow ( \ n :: S \ , \ E \ ,\ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dup)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow ( \ n :: n :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dupErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(add)} \UnaryInfC{$( \ m :: n :: S \ , \ E \ , \ \textsf{add} \ P \ ) \longrightarrow ( \ (m + n) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(addErr1)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(addErr0)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(sub)} \UnaryInfC{$( \ m :: n :: S \ , \ E \ , \ \textsf{sub} \ P \ ) \longrightarrow ( \ (m - n) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(subErr1)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(subErr0)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifFalse)} \UnaryInfC{$( \ 0 :: S \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q_2 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$n \not = 0$} \RightLabel{(ifTrue)} \UnaryInfC{$( \ n :: S \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ E \ , \ Q_1 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x)$ is a number} \RightLabel{(lookup)} \UnaryInfC{$( \ S \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow ( \ \mathsf{fetch}(E, x) :: S \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x)$ is not a number} \RightLabel{(lookupErr)} \UnaryInfC{$( \ S \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assign)} \UnaryInfC{$( \ m :: S \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow ( \ S \ , \ \mathsf{update}(E, x, m) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assignErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(funDef)} \UnaryInfC{$( \ S \ , \ E \ , \ \mathsf{def} \ F \ \mathsf{begin} \ Q \ \mathsf{end} \ P \ ) \longrightarrow ( \ S \ , \ \mathsf{update}(E, F, Q) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, F) = (Q, i)$ and $Q$ is a program} \RightLabel{(call)} \UnaryInfC{$( \ S \ , E \ , \ \mathsf{call} \ F \ P \ ) \longrightarrow ( \ S \ , \ \langle \mathsf{len}(E), \varnothing, P, i \rangle :: E \ , \ Q \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(return)} \UnaryInfC{$( \ S \ , \ \langle i, L, R, j \rangle :: E \ , \ \epsilon \ ) \longrightarrow ( \ S \ , \ E \ , \ R \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, F)$ is not a program} \RightLabel{(callErr)} \UnaryInfC{$( \ S \ , \ E \ , \ \mathsf{call} \ F \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree}Example program:
push 2 assign X def F begin lookup X push 4 assign X end call F push 3 assign X call F
Example evaluation:
( ∅ , ⟨0 , [] , ϵ , -1 ⟩ :: ∅ , push 2 assign X def F begin lookup X push 4 assign X end call F push 3 assign X call F ) ⟶ ( 2 :: ∅ , ⟨0 , [] , ϵ , -1 ⟩ :: ∅ , assign X def F begin lookup X push 4 assign X end call F push 3 assign X call F ) ⟶ ( ∅ , ⟨ 0 , [X ↦ 2] , ϵ , -1 ⟩ :: ∅ , def F begin lookup X push 4 assign X end call F push 3 assign X call F ) ⟶ ( ∅ , ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ :: ∅ , call F push 3 assign X call F ) ⟶ ( ∅ , ⟨ 1 , [] , push 3 assign X call F , 0 ⟩ :: ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ :: ∅ , lookup X push 4 assign X ) ⟶ ( 2 :: ∅ , ⟨ 1 , [] , push 3 assign X call F , 0 ⟩ :: ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ :: ∅ , push 4 assign X ) ⟶ ( 4 :: 2 :: ∅ , ⟨ 1 , [] , push 3 assign X call F , 0 ⟩ :: ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ :: ∅ , assign X ) ⟶ ( 2 :: ∅ , ⟨ 1 , [X ↦ 4] , push 3 assign X call F , 0 ⟩ :: ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ :: ∅ , ϵ ) ⟶ ( 2 :: ∅ , ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ :: ∅ , push 3 assign X call F ) ⟶ ( 3 :: 2 :: ∅ , ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 2] , ϵ , -1 ⟩ :: ∅ , assign X call F ) ⟶ ( 2 :: ∅ , ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 3] , ϵ , -1 ⟩ :: ∅ , call F ) ⟶ ( 2 :: ∅ , ⟨ 1 , [] , ϵ , 0 ⟩ :: ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 3] , ϵ , -1 ⟩ :: ∅ , lookup X push 4 assign X ) ⟶ ( 3 :: 2 :: ∅ , ⟨ 1 , [] , ϵ , 0 ⟩ :: ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 3] , ϵ , -1 ⟩ :: ∅ , push 4 assign X ) ⟶ ( 4 :: 3 :: 2 :: ∅ , ⟨ 1 , [] , ϵ , 0 ⟩ :: ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 3] , ϵ , -1 ⟩ :: ∅ , assign X ) ⟶ ( 3 :: 2 :: ∅ , ⟨ 1 , [X ↦ 4] , ϵ , 0 ⟩ :: ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 3] , ϵ , -1 ⟩ :: ∅ , ϵ ) ⟶ ( 3 :: 2 :: ∅ , ⟨ 0 , [F ↦ lookup X push 4 assign X; X ↦ 3] , ϵ , -1 ⟩ :: ∅ , ϵ ) ⟶ ( 3 :: 2 :: ∅ , ∅ , ϵ ) ✓
8.1.2. Exercises
Give an evaluation of the following program.
push 2 assign X def F begin push 3 assign X call G end def G begin lookup X end call F
Implement the function
let global_update (e : env) (x : string) (v : value) = assert false (* TODO *)
which always updates the binding in the global frame (as opposed to the topmost frame).
- Extend the language and the operational semantics to include a
commands
gassign
which assigns variables in the global frame, even within a subroutine.
8.2. Closures
Closures are subroutines together with data they need to be executed correctly. At a minimum, this data consists of a collection of captured bindings, i.e., bindings which may no longer be in scope or may be shadowed when the function is called. It often also includes the name given to the subroutine, as this is important for implementing recursive functions.
Closures can be used to implement higher-order functions and lexically scoped immutable variable bindings. If variable bindings are immutable, a subroutine can be defined as a closure which stores all the variable bindings it that exist when it is defined.
Closure are also used in continuation-passing style implementations of subroutines. We can save a closure containing the current environment and (a pointer to) the return program. This closure is often called the current continuation.
To simplify things, we can keep track of two stacks, one for data and the other for holding continuations for returning from a function.10
type closure = { name: string ; captured: (string * value) list ; body : program } and value = Closure of closure | Num of int type env = (string * value) list type ccs = closure list
8.2.1. Stack-Oriented Language with Lexical Scoping and Higher-Order Functions
In the operational semantics of the following stack oriented language, we take a configuration to be a program (\(P\)) together with a stack of values (\(S\)) which includes integers and closures, a stack continuations (\(C\)) used for returning from a function call, and an environment (\(E\)), represented as a list of bindings (as in the case of our stack-oriented language with dynamic scoping). We use
\begin{equation*} \langle F, L, P \rangle \end{equation*}to denote a closure with name \(F\), captured bindings \(L\), and body \(P\).
Note that we need to be able to put closures on the stack in order to implement higher-order functions.
Operational Semantics of a stack-oriented language with lexical scoping, immutable variables, and higher-order functions:
\begin{prooftree} \AxiomC{} \RightLabel{(push)} \UnaryInfC{$(\ S \ , \ C \ , \ E \ , \ \textsf{push n} \ P \ ) \longrightarrow ( \ n :: S \ , \ C \ , \ E \ ,\ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dup)} \UnaryInfC{$( \ n :: S \ , \ C \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow ( \ n :: n :: S \ , \ C \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(dupErr)} \UnaryInfC{$( \ \varnothing \ , \ C \ , \ E \ , \ \textsf{dup} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(add)} \UnaryInfC{$( \ m :: n :: S \ , \ C \ , \ E \ , \ \textsf{add} \ P \ ) \longrightarrow ( \ (m + n) :: S \ , \ C \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(addErr1)} \UnaryInfC{$( \ n :: S \ , \ C \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(addErr0)} \UnaryInfC{$( \ \varnothing \ , \ C \ , \ E \ , \ \mathsf{add} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(sub)} \UnaryInfC{$( \ m :: n :: S \ , \ C \ , \ E \ , \ \textsf{sub} \ P \ ) \longrightarrow ( \ (m - n) :: S \ , \ C \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(subErr1)} \UnaryInfC{$( \ n :: S \ , \ C \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(subErr0)} \UnaryInfC{$( \ \varnothing \ , \ C \ , \ E \ , \ \mathsf{sub} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifFalse)} \UnaryInfC{$( \ 0 :: S \ , \ C \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ C \ , \ E \ , \ Q_2 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$n \not = 0$} \RightLabel{(ifTrue)} \UnaryInfC{$( \ n :: S \ , \ C \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow ( \ S \ , \ C \ , \ E \ , \ Q_1 \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(ifErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \textsf{then} \ Q_1 \ \textsf{else} \ Q_2 \ \textsf{end} \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x) \not = \bot$} \RightLabel{(lookup)} \UnaryInfC{$( \ S \ , \ C \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow ( \ \mathsf{fetch}(E, x) :: S \ , \ C \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, x) = \bot$} \RightLabel{(lookupErr)} \UnaryInfC{$( \ S \ , \ C \ , \ E \ , \textsf{lookup} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assign)} \UnaryInfC{$( \ n :: S \ , \ C \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow ( \ S \ , \ C \ , \ \mathsf{update}(E, x, n) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(assignErr)} \UnaryInfC{$( \ \varnothing \ , \ E \ , \ \mathsf{assign} \ x \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(funDef)} \UnaryInfC{$( \ S \ , \ C \ , \ E \ , \ \mathsf{def} \ F \ \mathsf{begin} \ Q \ \mathsf{end} \ P \ ) \longrightarrow ( \ S \ , \ C \ , \ \mathsf{update}(E, F, \langle F, E, Q\rangle) \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, F) = \langle G, L, Q\rangle$} \RightLabel{(call)} \UnaryInfC{$( \ S \ , \ C \ , \ E \ , \ \mathsf{call} \ F \ P \ ) \longrightarrow ( \ S \ , \ \langle\mathsf{cc}, E, P\rangle :: C \ , \ \mathsf{update}(L, G, Q) \ , \ Q \ )$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(return)} \UnaryInfC{$( \ S \ , \ \langle\mathsf{cc}, E, P \rangle :: C \ , \ L \ , \ \epsilon \ ) \longrightarrow ( \ S \ , \ C \ , \ E \ , \ P \ )$} \end{prooftree} \begin{prooftree} \AxiomC{$\mathsf{fetch}(E, F) = \bot \quad$ or $\quad \mathsf{fetch}(E, F) \in \mathbb Z$} \RightLabel{(callErr)} \UnaryInfC{$( \ S \ , \ C \ , \ E \ , \ \mathsf{call} \ F \ P \ ) \longrightarrow \mathsf{ERROR}$} \end{prooftree}Example evaluation (of the same program as in the previous section:
( ∅ , ∅ , [] , push 2 assign X def F begin lookup X push 4 assign X end call F push 3 assign X call F ) ⟶ ( 2 :: ∅ , ∅ , [] , assign X def F begin lookup X push 4 assign X end call F push 3 assign X call F ) ⟶ ( ∅ , ∅ , [ X ↦ 2 ] , def F begin lookup X push 4 assign X end call F push 3 assign X call F ) ⟶ ( ∅ , ∅ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] , call F push 3 assign X call F ) ⟶ ( ∅ , ⟨ cc , push 3 assign X call F , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ :: ∅ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] , lookup X push 4 assign X ) ⟶ ( 2 :: ∅ , ⟨ cc , push 3 assign X call F , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] ⟩ :: ∅ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] , push 4 assign X ) ⟶ ( 4 :: 2 :: ∅ , ⟨ cc , push 3 assign X call F , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] ⟩ :: ∅ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] , assign X ) ⟶ ( 2 :: ∅ , ⟨ cc , push 3 assign X call F , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] ⟩ :: ∅ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 4 ] , ϵ ) ⟶ ( 2 :: ∅ , ∅ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] , push 3 assign X call F ) ⟶ ( 3 :: 2 :: ∅ , ∅ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] , assign X call F ) ⟶ ( 2 :: ∅ , ∅ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 3 ] , call F ) ⟶ ( 2 :: ∅ , ⟨ cc , ϵ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 3 ] ⟩ :: ∅ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] , lookup X push 4 assign X ) ⟶ ( 2 :: 2 :: ∅ , ⟨ cc , ϵ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 3 ] ⟩ :: ∅ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] , push 4 assign X ) ⟶ ( 4 :: 2 :: 2 :: ∅ , ⟨ cc , ϵ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 3 ] ⟩ :: ∅ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 2 ] , assign X ) ⟶ ( 2 :: 2 :: ∅ , ⟨ cc , ϵ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 3 ] ⟩ :: ∅ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 4 ] , ϵ ) ⟶ ( 2 :: 2 :: ∅ , ∅ , [ F ↦ ⟨ F , lookup X push 4 assign X , [X ↦ 2] ⟩ ; X ↦ 3 ] , ϵ ) ✓
Example program with higher-order functions:
def K begin assign X def F begin assign Y lookup X end lookup F end push 3 call K assign THREE push 2 call THREE
Example evaluation with higher-order functions:
( ∅ , ∅ , [] , def K begin assign X def F begin assign Y lookup X end lookup F end push 3 call K assign THREE push 2 call THREE ) ⟶ ( ∅ , ∅ , [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] , push 3 call K assign THREE push 2 call THREE ) ⟶ ( 3 :: ∅ , ∅ , [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] , call K assign THREE push 2 call THREE ) ⟶ ( 3 :: ∅ , ⟨ cc , assign THREE push 2 call THREE , [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] ⟩ :: ∅ , [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] , assign X def F begin assign Y lookup X end lookup F ) ⟶ ( ∅ , ⟨ cc , assign THREE push 2 call THREE , [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] ⟩ :: ∅ , [ X ↦ 3 ; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] , def F begin assign Y lookup X end lookup F ) ⟶ ( ∅ , ⟨ cc , assign THREE push 2 call THREE , [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] ⟩ :: ∅ , [ F ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩ ; X ↦ 3 ; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] , lookup F ) ⟶ ( ⟨ F , assign Y lookup X , [X ↦ 3] ⟩ :: ∅ , ⟨ cc , assign THREE push 2 call THREE , [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] ⟩ :: ∅ , [ F ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩ ; X ↦ 3 ; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] , ϵ ) ⟶ ( ⟨ F , assign Y lookup X , [X ↦ 3] ⟩ :: ∅ , ∅ , [ K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] , assign THREE push 2 call THREE ) ⟶ ( ∅ , ∅ , [ THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩ ; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] , push 2 call THREE ) ⟶ ( 2 :: ∅ , ∅ , [ THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩ ; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] , call THREE ) ⟶ ( 2 :: ∅ , ⟨ cc , ϵ , [THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩] ⟩ :: ∅ , [ THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩ ; X ↦ 3 ] , assign Y lookup X ) ⟶ ( ∅ , ⟨ cc , ϵ , [THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩] ⟩ :: ∅ , [ Y ↦ 2 ; THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩ ; X ↦ 3 ] , lookup X ) ⟶ ( 3 :: ∅ , ⟨ cc , ϵ , [THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩] ⟩ :: ∅ , [ Y ↦ 2 ; THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩ ; X ↦ 3 ] , ϵ ) ⟶ ( 3 :: ∅ , ∅ , [ THREE ↦ ⟨ F , assign Y lookup X , [X ↦ 3] ⟩ ; K ↦ ⟨ K , assign X def F begin assign Y lookup X end lookup F , [] ⟩ ] , ϵ ) ✓
8.2.2. Exercises
Evaluate the following program.
def ADDFIVE begin push 5 add end def TWICE begin assign F call F call F end push 0 lookup ADDFIVE call TWICE
Write a program which terminates for the stack-oriented languages with
- dynamic scoping
- lexical scoping via activation records
- lexical scoping via closures
and leaves one number on the stack, but which leaves a different number in each of the three cases.
- Redefine the language and semantics to include a
return
command (as in project 3). - Redefine the language and semantics to include a
call
command (as in project 3) so that the language has anonymous functions. - (Open ended) Describe the benefit of having two stacks (one for data and one for returns) instead of just one.
Footnotes:
This is, of course, not the only way to achieve this. We can also use continuation-passing style, as we saw briefly in lecture.
You are not required to write formal typing derivations, but you should be able to use your intuitions about OCaml to determine the type of an expression based on its structure (as the OCaml type-checker does mechanically).
In formal language theory, a grammar is a more general than this.
The exclamation points are for emphasis, they are not a part of the derivation.
Other source may include more constructs in EBNF meta-syntax. See, for example, the Wikipedia page.
It is a theorem of formal language theory that regular grammars and regular expressions are equivalent in expressivity.
There is an efficient parsing algorithm for grammars in Chomsky normal form. This is the main primary we consider grammars of this form.
In the remainder of this document, we will continue this trend. Each following stack-oriented language is a variation on a theme.
This is not the most efficient implementation.
This is what the language Forth does.