Course Notes
Table of Contents
This is very rough draft set of notes (more of a collection of thoughts) for the course CAS CS 392 at Boston University. It's primarily a supplement to The Rust Programming Language (RPL). It's full of typos and dumb jokes.
1. What is this course?
We usually learn PLs the "wrong" way: throw spaghetti code at the wall and see what sticks.1 This course is an experiment in learning a PL the "right" way: give an airtight specification of a PL using the mathematical framework of PL theory.
Rust is an excellent testbed for this kind of experiment. It's a PL that's gaining in popularity but has some bizzaro features in its type systems2 that turn out to be incredibly useful for addressing some of the more egregious bugs in low-level programming. Add to that the fact the theoretical underpinnings of Rust are simultaneously very old (it's based on linear logic, an innovation of Jean-Yves Girard from the 80s, based on ideas that came long before it) and very young (the formalization of Rust-proper postdates its own conception and popularity, as evidenced by the publication year (2021) of the paper we'll use in the second half of the course). This gives us a wide range of sources to better understand what the heck is going on with this language.
The points is not to learn Rust, at least not entirely.3 The point is to understand the core of Rust. Per the previous paragraph, this means learning a bit of mathematics. It also means dipping into research from this decade, which means learning to read academic papers (more of a skill than you might expect). And since we're learning a PL, we might as well build something with it. The proverbial "they" say that you don't know a PL until you've done a project in it. The less-proverbial "they" might say you don't know a PL until you understand it's formal specification. We'll kill two birds with one stone4 and build an interpreter in Rust for a subset of Rust. So our rough plan:
- learn enough Rust to be dangerous
- learn to read/practice reading research papers in PL, so that we can read a paper which gives a formal specification of a subset of Rust (we'll do this together as a group)
- Test our knowledge by implementing that subset of Rust in Rust
Now, admittedly, I'd like to believe that's all the motivation we'll need. If you're interested in this course, I'd hope you also have some iota of preexisting interest in Rust. But it is my duty to (begrudgingly) motivate why you should learn something interesting.
- For those interested in PL: Linear types, affine types, types for concurrency, these are current research areas. Rust is something of a success story, one of those rare instances of a PL becoming popular for being both interesting and usable. If you took CS320, it's one thing to give a formal specification of a clean functional PL, applying these ideas to Rust and friends will be a formative challenge.
- For those interested in programming: I am of the opinion that it's always valuable to learn a new language paradigm, but maybe that's not enough motivation──Rust is ranked 14th according to the December 2024 TIOBE index with one of the most generally uptrending graphs in the top 20. It's being adopted by large companies like Amazon and Microsoft, but also smaller companies, anyone who needs safe low-level code. Rust is an ahead-of-time compiled language; it's good to have one of those in your back pocket, and I think Rust is one of the more interesting choices (over, say, C).
- For those interested in systems: I can't pretend to really know what's going on in the world of systems, but Rust touts itself as having a lot of potential in this area. It may be worthwhile to understand it, even if just to be able to adequately respond to the hype.
- Everyone else: I dunno, good luck. If you're not one of the above, I honestly don't see any reason why you should learn Rust.
Enough preamble, let's Rust.
2. Set Up
Set up and installation of Rust is comparatively easy. We'll
primarily be using Cargo, a build system for Rust akin to OCaml's Dune
(but quite a bit better). You should follow the rustup
installation
instructions in RPL for your OS.
Note: If you're running Windows, then I'll assume that you're using WSL. You're welcome to try to install directly, but I will not help you troubleshoot. Not so much because I don't want (also, I don't) but because I don't really know how.
If you'd like, you can go through the Hello, World! section of RPL, but we won't be using the rust compiler directly in this course. Instead, we'll follow RPL's recommendations on project hygiene and always assume a Cargo project. You should go through the Hello, Cargo! section of RPL (we'll also go over it in lecture).
3. The Basics
This is a collection of concepts that are common to all programming languages. I expect that you're already familiar with all of these concepts. The goal here is to provide a more careful analysis of these concepts than what is given in the Rust book, primarily with respect to syntax, and a bit of the typing
3.1. Variables and Mutability
Grammar:
<var-decl> ::= let <var-ident> = <expr> | let mut <var-ident> = <expr> <var-ident> ::= ; snake_case ;
Variables are immutable by default. This is generally safer (mutability is the root of much evil). If you attempt to mutate an immutable variable, you'll get a compiler error. You can reuse (i.e., shadow) variable names with declarations.
Example:
let x = 1; assert_eq!(x, 1);
We use the mut
keyword to declare mutable variables, which are
allowed to be reassigned/mutated.
Example:
let mut x = 1; x = 2; assert_eq!(x, 2);
3.2. Constants
Grammar:
<const-decl> ::= const <const-ident> : <ty> = <expr> <const-ident> ::= ; SCREAMING_SNAKE_CASE ;
Constants are like immutable variables except that:
- they are declared with the
const
keyword - by convention, they are named using
SCREAMING_SNAKE_CASE
- they must be type annotated (I don't know why yet)
- they must be assigned to a value that can be computed at compile-time (e.g., no stdin reads)
As in most PLs, constants are primarily useful for code clarity and maintenance.
Example:
const SPEED_OF_LIGHT : u32 = 299792458; assert_eq!(SPEED_OF_LIGHT, 299792458);
3.3. Data Types
Grammar:
<ty> ::= <scalar-ty> | <compound-ty> <scalar-ty> ::= <int-ty> | <float-ty> | bool | char <int-ty> ::= <sint-ty> | <uint-ty> <sint-ty> ::= i16 | i32 | i64 | i128 | isize <uint-ty> ::= u16 | u32 | u64 | u128 | usize <float-ty> ::= f32 | f64 <compound-ty> ::= <tuple-ty> | <array-ty> <tuple-ty> ::= ( <tys> ) <tys> ::= ϵ | <ty> | <ty> , <tys> <array-ty> ::= [<ty>;<usize-lit>]
Rust is statically typed, i.e., the type of every variable is known at compile-time. Rust does some amount of type inference (except for constant, it seems). When in doubt, type annotate variables. It doesn't hurt, and can act as a simple form of documentation.
There are two kinds of data types: scalar and compound. We'll do a quick outline of these types below, most of this should be familiar.
- Scalar:
- Integers:
i8
,i16
,i32
,i64
,i128
, andu8
,u16
,u32
,u64
,u128
- The number indicates the number of bits used to represent the integer. The default for type inference is i32.
- There is also
isize
andusize
used for indexing - There are standard literals for decimal, hex, octal, binary, and bytes. Look them up if you need them.
- Rust's compiler can check for integer overflow in debug mode, but wraps by default for release mode. We're gonna ignore this for now (as if we'll be writing any production code…)
- Floating-point numbers:
f32
andf64
. The default for type inference isf64
.- Represented using IEEE-754 (of course)
- Booleans:
- type:
bool
- two literals:
true
andfalse
- type:
- Characters:
- type:
char
- example:
'q'
,'✗'
- four bytes, represent unicode scalars (characters are weird but I don't care at the moment though)
- type:
- Integers:
- Compound:
- Tuples:
- type:
(t1, t2,..., tk)
- literal:
(e1, e2,..., ek)
accessor:
p.i
Example:
let tup: (i32, bool, u32) = (2, true, 2); assert_eq!(tup.1, true);
- type:
- Arrays:
- type:
[t; n]
wheret
is a type andn
is ausize
for the number of elements. IMPORTANT: Arrays are fixed length. - Arrays are allocated on the stack. Rust panics if an index is out of bounds (in particular, it actually checks at runtime).
indexing:
a[i]
, wherei
is ausize
.Example:
const J : usize = 5; let a : [u32; J] = [1, 2, 3, 4, 5]; let i : usize = 2; assert_eq!(a[i], 3);
- type:
- Tuples:
Check here for details on operators, arithmetic, Boolean or otherwise. We will, of course, see more types as the course goes on (and define our own types).
3.4. Functions
Grammar:
<fun-decl> ::= fn <fun-ident>(<params>) <block> | fn <fun-ident>(<params>) -> <ty> <block> <params> ::= ϵ | <param> | <param> , <params> <param> ::= <var-ident> : <ty> <block> ::= { <stmts> } <fun-ident> ::= ; snake_case ;
Functions in Rust behave much like functions in other languages. A couple key notes:
- parameters must be type-annotated
- the return type must be given if the function returns a value
There is a return
keyword, but we don't often use it; the return
value is the last expression in the function block. If no last
expression is given then the return value is ()
of type ()
(i.e.,
the unit type); see the <block>
case in the given grammar for more
details about how the bodies of functions might look (there's a bit of
trickiness with regards to semicolons but I'm not going to dwell on it
since the compiler is pretty good at catching these things).
Example:
fn sum_of_squares(x : u32, y : u32) -> u32 { let x_squared = x * x; let y_squared = y * y; x_squared + y_squared }
3.5. Statements and Expressions
Grammar:
<stmts> ::= ϵ | <expr> | <fun-decl> <stmts> | <stmt> ;<stmts> | <expr> ;<stmts> | <expr-no-sc> <stmts> <stmt> ::= <var-decl> | <const-decl> <expr-no-sc> ::= <if-expr> | <while-expr> | <for-expr> <expr> ::= <block> | <lit> | <uop-expr> | <bop-expr> | <call-expr> | ( <expr> ) <lit> ::= <int-lit> | <float-lit> | <char-lit> | bool-lit> <int-lit> ::= ; see docs ; <float-lit> ::= ; see docs ; <char-lit> ::= ; see docs ; <bool-lit> ::= true | false <uop-expr> ::= <uop> <expr> <bop-expr> ::= <expr> <bop> <expr> <uop> ::= ; see docs ; <bop> ::= ; see docs ; <call-expr> ::= <fn-ident>(<exprs>); <exprs> ::= ϵ | <expr> | <expr> , <exprs>
Statements perform actions and do not have a value. Expressions have
values, and are evaluated. The only statements we've seen so far are
declarations. Any expression can be assigned to a variable (you
should check this, it works even for things like while
expressions).
3.6. Comments
Use them or don't, I don't really care.
Example:
// a comment // another comment
3.7. Control Flow
Grammar:
<if-expr> ::= if <expr> { <stmts> } <else-if-expr> <else-if-expr> ::= ϵ | else { <stmts> } | else if { <stmts> } <else-if-expr> <while-expr> ::= while <expr> { <stmts> } <for-expr> ::= for <var-ident> in <expr> { <stmts> }
Control flow is also pretty standard. A couple things to keep in mind:
- control flow is defined by expressions. This means they can be used as the values of variables; this is particularly nice for if-expressions
- the then-case and the else-case blocks must be the same type
4. Ownership
Ownership accounts for the first plateau in the learning curve of Rust.
4.1. Stack and Heap
- there are two kinds of memory we need to think about: stack and heap
- We won't care too much about this except up to it's usefulness as an analogy.
- Stack memory is simple. If all we needed to do was work with the
stack, then we wouldn't be having this conversation.
- The restriction: the stack needs to have fixed size.
- I still don't really know why, why not store dynamic sized memory on the stack? (Let's just assume this)
- The restriction: the stack needs to have fixed size.
- The heap is less organized
- You must request space, and it must be allocated, you work with a pointer to the memory.
- In rust we won't often think in terms of pointers, this is part of the point
- pushing to the stack is faster, accessing is faster, but data must be fixed size, and is ephemeral. Why would we want to put anything on the stack? If it's size can change, or if it needs to persist
- The problem: if we put things on the heap, we need to know when we can deallocate to free up memory.
- In C, you have to do this by hand
- In modern programming languages, there is so-called garbage collecting
- In Swift, there is automatic reference counting
- What are the benefits of GC? Not sure. I think it's a bit of magic.
- We also don't want to DUPLICATE
- The main idea: every value has a single owner (variable). When the owner goes out of scope, so does the value.
- This requires carefully understanding scope.
- The scope of a variable is: after it is declared up to the end of the block in which it is declared.
- expression which update must be updating something on the heap
- This requires carefully understanding scope.
4.2. References and Borrowing
- Rust automatically calls
drop
function when a any variable goes out of scope - (We should say something about static memory)
- Values which implement the
Copy
trait are types which can be copied assignment, no moving. - There are three things: copy, move, (mutable) borrow
- Exercise how can we implement the swap function? (We can't really)
Footnotes:
My nature does not allow me to introduce anything without a wannabe-hybrid-Salinger-Wallace-esqe preamble. I wish it didn't have to be this way but at some point, we can't break from our aesthetic inclinations.
An uncommon pairing, most popular languages are beige wallpaper boring.
I don't even like Rust that much, to be honest. It's probably most fascinating to me from a social theoretical perspective than from a PL design perspective.
I had a conversation with some folks about possible alternative idioms. PETA recommends "feed two birds with one scone." Not punchy enough in my opinion. My dad came up with two pretty good ones: "strike out to batters with one pitch" (nonsensical in a pleasant way) and "destroy two planets with one deathstar" (to better align with nerd culture).