The Interpretation Pipeline
Table of Contents
When we write a program in our favorite PL1 we fill a file with a bunch of symbols and text. This is one of the beauties of programming: looking past the bells and whistles provided IDEs, a program is just a stream of characters.
Eventually we want to verify that what we've typed up written actually works, so we run our program. In some IDEs, this means pressing a play button. In many cases it means opening a terminal and typing out a few commands.
In either case, we're running a different program in order to run the program we've written (huh). Our goal is to understand what's going on here. What is this program doing? How does it do it? What sorts of data structures does this program use? Our intuition should tell us this program is doing probably something complicated.2
So, our basic question is: How do we get from that stream of symbols to the output of our program?
Interpretation vs. Compilation
We start by making a loose but important distinction between interpretation and compilation. At the risk of oversimplifying, we'll use the following definitions.
Interpretation is the process of taking a source program and its input, and then running it to get its output.
char stream ────►┌─────────────┐ │ INTERPRETER │────► output input ────►└─────────────┘
Compilation is the process of taking a source program in a high-level language and then translating it into a program in a low-level language which can be given input and run separately.
┌──────────┐ char stream ────►│ COMPILER │─╮ └──────────┘ │ ╭─low-level program─╯ │ ╰─────►┌──────────┐ │ TARGET │───► output input ────►└──────────┘
Again, this is a loose distinction. There may be intermediate translations in the process of interpretation (this is what OCaml does). We'll look briefly at compilation in this course but we'll focus primarily on interpretation.
Interpretation Pipeline
So, what's happening inside an interpreter? We start with a diagram.
We can think of this image as providing a window into the
INTERPRETER
box above.
+----------------------------+ | ┌────────────────────┐ | char stream ────────►│ LEXICAL ANALYSIS │─╮ | | └────────────────────┘ │ | | ╭──────token stream──────╯ | | │ ┌────────────────────┐ | | ╰►│ SYNTACTIC ANALYSIS │─╮ | | └────────────────────┘ │ | | ╭───────parse tree───────╯ | | │ ┌────────────────────┐ | | ╰►│ SEMANTIC ANALYSIS │─╮ | | └────────────────────┘ │ | | ╭─────────AST/IR─────────╯ | | ╰►┌────────────────────┐ | input ────────►│ EVALUATION │────────► output | └────────────────────┘ | +----------------------------+
- The first two boxes turn a stream of characters into something that we can evaluate (i.e., that we can run). This part of interpretation is about the form (or syntax) of the program. We'll use formal grammars to represent this form mathematically and we'll use parser generators to set up the code for these two boxes. Just like natural language, PLs are hierarchical, and making that heirarchical structure explicit makes evaluation easier.
- The next box is where we do type checking and/or type inference. It's during this part of the pipeline that we verify if the program we're planning to evaluate actually makes sense. If it does, then we can pass it onto the evaluator.
- The last box actually runs our program to get its value (i.e., its output). This part of interpetation is about the meaning (or semantics) of the program. We use formal semantics to describe what it means to give a program this meaning.
Expectations
PL has implications both in theory and in practice. It will be important for us not just to think about how to implement an interpreter, but also about the underlying logical frameworks we need to reason about PLs.
There will be a frequent give-and-take between theory and practice. Be prepared to do handwritten work (not quite proofs, but something akin to them) as well as programming (i.e., building the components interpreter).