Assignment 2
This assignment due Thursday 9/18 by 8:00 PM. You'll need to submit
a .zip
file containing a Cargo project named hw2
in
Gradescope. You can get the setup with:
cargo new hw2
You'll can put your solutions into the file hw1/src/main.rs
. Please
make sure to run cargo clean
before submitting, i.e., don't sumbit
the _build
directory.
The specification for this assignment is very simple: implement an S-expressions parser. This is a great way to make sure we understand how to work with the basic constructs of the language.
Easy Mode
You should use the following types in your program:
#[derive(Debug, Clone, PartialEq)] enum Token { Lparen, Rparen, Atom(String), } #[derive(Debug, Clone, PartialEq)] enum Sexpr { Atom(String), List(Vec<Sexpr>), }
In the simplest form of this assignment, you're required to implement the following functions.
fn lex(input: &str) -> Option<Vec<Token>> { todo!() } fn parse(input: Vec<Token>) -> Option<Vec<Sexpr>> { todo!() }
In the interest of simplicity (and a low memory footprint), we're not going to deal with useful error messages. We'll also take the simplest definition of an atom at the getgo: an atom is a contiguous sequence of non-whitespace non-parentheses unicode characters.
And that's all folks. Happy coding.
Hard Mode
It is very easy to make this task very hard. Let's look at a couple options.
- It's not uncommon to allow comments in files containing
S-expressions. Take a look at the
sexplib
for one possible specifications of how comments might work. Implement all or a subset of these into your lexer. - Currently we cannot include whitespace (among other things) in our atoms. This is typically handled by allows string literals to be a part of an S-expression
- I think one of the more interesting challenges in write a good
S-expression parser is to do so with a low memory footprint. Look
into
BufReader
instd
and use this parse an S-expression from a file without storing the entire contents of the file in memory. In particular, see if you can handle the case in which the file contains a very large S-expression without any newlines. For this, you may assume that the file only contains ASCII characters.