Assignment 5
The following assignment is due Thursday 3/20 by 8:00 PM. You'll
need to submit one crate on Gradescope called assign5
.
The specification for this assignment is very simple: implement an S-expressions parser. This is a great way to make sure sure you understand how to use indirection to build and work with recursive data.
You're required to use the following types in your program (you may want to add derived traits):
struct Metadata { line_num: usize, col_num: usize, } enum Sexpr<T> { Atom(T), List(Box<Vec<Sexpr<T>>>), } enum ParseErrorKind { NotClosed, ExtraClosed, } struct ParseError { kind: ParseErrorKind, metadata: Metadata, }
You're required to implement the following method:
impl Sexpr<String> { fn parse(input: &str) -> Result<Vec<Sexpr<String>>, ParseError> { todo!() } }
This function should succeed if str
represents a sequence of
S-expressions. For example, parsing:
(add (mul 2 3) 4) ((one "1") (two "2"))
should give you the S-expression (eliding boxes):
[ List([ Atom("add"), List([Atom("mul"), Atom("2"), Atom("3")]), Atom("4") ]), List([ List([Atom("one"), Atom("\"1\"")]), List([Atom("two"), Atom("\"2\"")]) ]), ]
There are only two ways that a sequence of general S-expression can be ill-formed:
- There is an unclosed opening parenthesis
- There is an extra closing parenthesis
In either of these cases, you should return a ParseError
. Note that
errors include Metadata
which refers to the line and column of the
offending parenthesis. In the first case, you should return an error
with metadata for the first unopened closing parentheis. In the
second case, you should return an error with metadata for the first
extra closing parenthesis.
Parsing the ill-formed S-expression:
((foo bar ) ) ) ( ( ) ) )
should give you:
ParseError { kind: ExtraClosed, metadata: Metadata { line_num: 2, col_num: 12 } }
whereas parsing:
((foo bar )) ( ( ( )
should give you:
ParseError { kind: NotClosed, metadata: Metadata { line_num: 2, col_num: 9 } }
You can do this however you'd like, but I'd recommend the following structure:
enum Lexeme { Lparen, Rparen, Atom(String), } struct Token { lexeme: Lexeme, metadata: Metadata, } struct Lexer<'a> { contents: Lines<'a>, curr_line_num: usize, curr_col_num: usize, curr_line: &'a str, } impl<'a> Lexer<'a> { fn new(input: &'a str) -> Lexer<'a> { Lexer { contents: input.lines(), curr_line_num: 0, curr_col_num: 1, curr_line: "", } } } impl<'a> Iterator for Lexer<'a> { type Item = Token; fn next(&mut self) -> Option<Token> { todo!() } }
- Generally speaking, it's much easier to parse if you've already lexed the input into its individual units. You can build an iterator which holds a slice of the input and grabs one token at a time.
- For parsing there are two natural approaches:
- Recursive descent: recursively parse a sequence of S-expressions every time you see an opening parenthesis.
- Stack: Keep track of a stack of S-expressions, and grab the appropriate chunk of the top of the stack every time you see a closing parenthesis.
Happy coding.