Assignment 2

Table of Contents

This assignment due Thursday 9/18 by 8:00 PM. You'll need to submit a .zip file containing a Cargo project named hw2 in Gradescope. You can get the setup with:

cargo new hw2

You'll can put your solutions into the file hw1/src/main.rs. Please make sure to run cargo clean before submitting, i.e., don't sumbit the _build directory.

The specification for this assignment is very simple: implement an S-expressions parser. This is a great way to make sure we understand how to work with the basic constructs of the language.

Easy Mode

You should use the following types in your program:

#[derive(Debug, Clone, PartialEq)]
enum Token {
    Lparen,
    Rparen,
    Atom(String),
}

#[derive(Debug, Clone, PartialEq)]
enum Sexpr {
    Atom(String),
    List(Vec<Sexpr>),
}

In the simplest form of this assignment, you're required to implement the following functions.

fn lex(input: &str) -> Option<Vec<Token>> {
    todo!()
}

fn parse(input: Vec<Token>) -> Option<Vec<Sexpr>> {
    todo!()
}

In the interest of simplicity (and a low memory footprint), we're not going to deal with useful error messages. We'll also take the simplest definition of an atom at the getgo: an atom is a contiguous sequence of non-whitespace non-parentheses unicode characters.

And that's all folks. Happy coding.

Hard Mode

It is very easy to make this task very hard. Let's look at a couple options.

  • It's not uncommon to allow comments in files containing S-expressions. Take a look at the sexplib for one possible specifications of how comments might work. Implement all or a subset of these into your lexer.
  • Currently we cannot include whitespace (among other things) in our atoms. This is typically handled by allows string literals to be a part of an S-expression
  • I think one of the more interesting challenges in write a good S-expression parser is to do so with a low memory footprint. Look into BufReader in std and use this parse an S-expression from a file without storing the entire contents of the file in memory. In particular, see if you can handle the case in which the file contains a very large S-expression without any newlines. For this, you may assume that the file only contains ASCII characters.