Assignment 5
Table of Contents
The following assignment is due Thursday 8/16 by 8:00 PM. You'll
need to submit a single Cargo project named hw5
on Gradescope. You
can get the setup with:
cargo new hw5
Please make sure to run cargo clean
before submitting, i.e., don't
sumbit a target
directory or a .git
directory.
The purpose of this assignment is to force y'all to work with a slightly larger Rust program. There are many projects we could do to get this kind of practice, but I like PLs, so we're going to build a forth interpreter.
Forth is a low-level stack-oriented PL. The beauty of forth its simplicity. For a quick tutorial on forth, I recommend this very nice blog post.
Remark. I'm going to leave a lot of getting familiar with forth to you. If you need some assistance with understanding forth, please reach out on Piazza or by email. In particular, I'd recommend looking at Conditional Execution and Recursion in the gforth manual to get a sense of how they work.
Building a forth interpreter in Rust is made quite a bit easier by the high-level abstractions Rust provides. For example, we can create a very clear abstract representation of a machine configuration (to be updated while evaluating a program).
enum Command { Push(i32), Drop, Dup, Dup2, Rot, Swap, Print, PrintStack, Add, Sub, Mul, Div, Mod, Lte, If, EndIf, Recurse(Option<Program>), } struct Program(Vec<Command>); struct Stack(Vec<i32>); type Dict = HashMap <String, Program>; enum State { Compile, Interpret, } struct Config { dict: Dict, stack: Stack, state: State, compiled: Program, compiled_word: String, } impl Stack { fn eval_prog...{...} } impl Config { fn eval_word...{...} } fn main() {...}
1. Evaluation
At the core of a forth interpreter is a function that evaluates the program associated with a word, updating the machine configuration as necessary. The interpreter keeps track of a dictionary of words that store programs in a basic command langauge (17 commands total). It's possible to define new words and add them to this dictionary using Colon Definitions. The small-step semantics of our little forth fragment is given as follows. In these rules, a configuration is of the form
\begin{align*} (D, S, T, w, Q, P) \end{align*}where each part of the configuration is given in this following table.
\(D\) | dictionary mapping words to programs |
\(S\) | stack |
\(T\) | state (\(\mathsf{I}\) for interpreting and \(\mathsf{C}\) for compiling) |
\(Q\) | program being compile |
\(w\) | name of program being compiled |
\(P\) | program being evaluated |
A couple notes about these semantics:
- The print command has the side effect of printing the top element of
the stack, and the printStack command has the side effect of
printing the entire stack (note that
Stack
implementsfmt::Format
). - All stuck configurations are considered error states.
- Note that the colon moves changes the state to be compiling, and the semicolon moves back into interpreting.
- The semantics while in the interpreting state roughly speaking
corresponds to
eval_prog
whereas the semantics while in the compiling state corresponds toeval_word
. - A usual, the way in which the semantics is presented does not always correspond to an implementation. For example, in the case of conditionals, there's the added concern of nested conditionals that is not immediately captured by the rules above.
2. Recursion
The trickiest part of this assignment is dealing with
recursion. You'll notice that the recurse
command carries an
Option<Program>
. Roughly speaking the recurse
command just keeps
track of what program it needs to run (see the recurse rule above).
When defining a recursive function, the recurse
command has to keep
track of the program currently being defined. So, to start, the
recurse
command just carries None
(otherwise, we'd have an
infinite data structure).
When you compile a word that uses recursion, the recurse
commands in the
program need to keep track of the program itself, so in this case the
None
should be replace with Some(p)
where p
is the program which
recursing should run.
You should convince yourself that this is necessary, but as a basic example, if I write the program:
: foo 1 + recurse ; : bar 2 + recurse foo ;
The word foo
will be compiled to a program of the form:
vec![Push(1), Add, Recurse(None)]
The word bar
will be compiled to a program of the form:
vec![ Push(2), Add, Recurse(None), Push(1), Add, Recurse(vec![Push(1), Add, Recurse(None)]) ]
The point: we need to be able to distinguish within a program which
recurse
commands correspond uses in the definition of the word, and
which correspond to compiled words.
3. Getting started
To get you started, I'm providing some starter code. Part of the assignment will be reading some Rust code and understanding how it works. In particular, there are several traits used throughout, I'd recommend reading the documentation on them and experimenting with them.
Also, here's a little example program to play around with:
: and * -1 * ; : >= swap <= ; : = 2dup <= rot rot >= and ; : or + -1 <= ; : not 0 = ; : < -1 + <= ; : > swap < ; : fib dup dup 0 = swap 1 = or not if dup -1 + recurse swap -2 + recurse + endif ; 11 fib . : fact dup 1 > if dup -1 + recurse * endif ; 5 fact . : fact_loop_aux dup 0 > if dup rot rot * swap -1 + recurse endif ; : fact_loop 1 swap fact_loop_aux drop ; : fact_loop_aux 0 ; 5 fact_loop .
4. Hard Mode
One unsatisfying feature of how recursion is handled is that every
time you compile a recursive function, you have to clone the entire
program and give it to the recurse
commands. If it's a long program
this can become a lot.
The challenge is to rewrite the interpreter so that programs are
reference counted. That way, a recurse
command can keep a reference
to the program it needs to run. Be careful not to create a reference
cycle!