Update: 2011/11/21: The code presented in this article was completely restructured (one instruction per line, classes implementation separated from the definition, more comments etc …) in the second article from this series.
Implementing a high order programming language in a low level language, like C++ (Assembly is too low level for my background and C … well there are already a few Scheme implementations in C), has always been a fascinating subject for me. Writing a program that interprets other programs is a great and fun experience for anyone, it is almost like a rite or passage for a programmer.
My purpose in starting this series of articles is to better understand some of the fundamentals of the Scheme programming language and how an interpreter works. A secondary purpose will be to test my Scheme implementation on some of the examples and exercises presented in SICP (I will probably skip the Picture language presented in Chapter 2). This will allow me to redo some of the exercises from the book on my own Scheme implementation and in the same time on a mature implementation like Gambit Scheme, for comparison purposes.
The Scheme subset I’ve choose to start with, is inspired by an article of Peter Norvig (How to Write a (Lisp) Interpreter (in Python)). This will be my Scheme - zero implementation, only six special forms (quote, if, set!, define, lambda and begin) and a generic Number type, that will allow simple operations with integers and floats.
The REPL will allow the programmer to test simple Scheme programs like these:
We will start by implementing the REPL function in C++:
The above code will read the user input (a Scheme expression) and will evaluate this expression. Internally, in C++, we will use vectors of strings from the STL to store a Scheme expression, we will create new C++ class PList for this. A PList can store one or more Scheme expressions, for example:
A simplified version of the PList class (for the complete code see “PList.h”):
A Scheme environment will be simulated with a map data structure that will use strings as keys and will store: strings, PLists and pointers to functions. This way I will be able to use a single data structure for managing variables and procedures in Scheme. This will allow us to redefine any variable as a procedure and vice-versa, like in the following example:
The environment implementation (see “Environment.h”) is as follows:
Adding a new Scheme procedure directly in the C++ code is as simple as:
where, for this particular case, the add function can be implemented as:
The above implementation will allow us to sum an arbitrary number of arguments. Please note that internally, each number is treated as a double. For now we will use only the native C++ numerical types, a future version of the interpreter will allow arbitrary integer size by use of the GMP library.
Using the above defined Environment we can implement a first version of the evaluator, this will act as a simple arithmetic calculator:
Let’s see the above code in action:
In the next article from these series, we will implement the main Scheme special forms: quote, if, set!, define, lambda and begin.
If you are interested in learning more about the new C++11 syntax I would recommend reading Professional C++ by M. Gregoire, N. A. Solter, S. J. Kleper 2nd edition:
or, if you are a C++ beginner you could read C++ Primer (5th Edition) by S. B. Lippman, J. Lajoie, B. E. Moo.