Implementing Scheme in C++ - Introduction
Posted on November 14, 2011 by Sol

The code for this article is on GitHub: https://github.com/sol-prog/schm.

Update: 2011/11/21: The code presented in this article was completely restructured (one instruction per line, classes implementation separated from the definition, more comments etc …) in the second article from this series.

Implementing a high order programming language in a low level language, like C++ (Assembly is too low level for my background and C … well there are already a few Scheme implementations in C), has always been a fascinating subject for me. Writing a program that interprets other programs is a great and fun experience for anyone, it is almost like a rite or passage for a programmer.

My purpose in starting this series of articles is to better understand some of the fundamentals of the Scheme programming language and how an interpreter works. A secondary purpose will be to test my Scheme implementation on some of the examples and exercises presented in SICP (I will probably skip the Picture language presented in Chapter 2). This will allow me to redo some of the exercises from the book on my own Scheme implementation and in the same time on a mature implementation like Gambit Scheme, for comparison purposes.

The Scheme subset I’ve choose to start with, is inspired by an article of Peter Norvig (How to Write a (Lisp) Interpreter (in Python)). This will be my Scheme - zero implementation, only six special forms (quote, if, set!, define, lambda and begin) and a generic Number type, that will allow simple operations with integers and floats.

The REPL will allow the programmer to test simple Scheme programs like these:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
schm >>>(+ 3 4 10)
17
schm >>>(define square (lambda (x) (* x x))
schm >>>(square 10)
100
schm >>>(define fact (lambda (x) (if (<= x 1) 1 (* x (fact (- x 1)))))) 
schm >>>(fact 1)
1
schm >>>(fact 8 )
40320
schm >>>(define list (lambda x x))
schm >>>(list 1 2 3)
(1 2 3)
schm >>>(quit)

We will start by implementing the REPL function in C++:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
/*Simple REPL (read - evaluate - print).*/
void REPL(Environment &env){
    prompt();
    for(;;){
        string inp=get_input();
        if(inp=="")continue; //if the input buffer is empty go to the start of the for loop
        vector<string>out=clean_input(inp);
        //Evaluate an expression and print the result
        PList pp = PList(out);
        cout<<eval(pp,env)<<endl;
        prompt();
    }
}

The above code will read the user input (a Scheme expression) and will evaluate this expression. Internally, in C++, we will use vectors of strings from the STL to store a Scheme expression, we will create new C++ class PList for this. A PList can store one or more Scheme expressions, for example:

1
2
(+ 1 2 3) ;4 PLists
(+ 1 2 (- 4 5)) ;4 PLists, the last Plist is made by 3 PLists 

A simplified version of the PList class (for the complete code see “PList.h”):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class PList{
    vector<string>store;
    
    public:
    PList(){}
    PList(vector<string>vv){store=vv;}
    void print(){for(size_t i=0;i<store.size();i++)cout<<store[i]<<" ";cout<<endl;}
    string get_store(){string aux=""; for(size_t i=0;i<store.size();i++)aux=aux+store[i]+" ";return aux;}
    void clear(){store.clear();}
    
    size_t size(){        
    ...
    }
        
    PList get(size_t pos){
    ...
    }
            
    size_t full_size(){
        return store.size();
    }
    
    string elem(size_t pos){
    ...
    }
    
    void puts(string ss){store.push_back(ss);}
    
   ~PList(){store.clear();}
};

A Scheme environment will be simulated with a map data structure that will use strings as keys and will store: strings, PLists and pointers to functions. This way I will be able to use a single data structure for managing variables and procedures in Scheme. This will allow us to redefine any variable as a procedure and vice-versa, like in the following example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
schm >>>(define aa 10)
schm >>>aa
10
schm >>>(define aa (lambda (x) (* a a a))
schm >>>aa
Procedure
schm >>>(aa 2)
8
schm >>>(define aa +)
schm >>>(aa 2 3)
5

The environment implementation (see “Environment.h”) is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class Object{
	string(*pp)(vector<string>&);
	string(*rr)();
	string value;
	string kind;
	
	public:
	Object(){};
	Object(string ss){value=ss;kind="variable";pp=NULL;};
        Object(string(*p_)(vector<string>&)){pp=p_;kind="procedure";value="";};
		string get_kind(){
			return kind;
		}
        string get_value(){
            return value;
        }
        string apply(vector<string>&V){
            return pp(V);
        }
        string apply(){
        	vector<string>V;
            return pp(V);
        }        
};

typedef map<string,Object> Environment;

Adding a new Scheme procedure directly in the C++ code is as simple as:

1
2
3
4
...
Environment env;
env["+"]=add;
...

where, for this particular case, the add function can be implemented as:

1
2
3
4
5
6
7
8
string add(vector<string>&vv){
    if(vv.size()==0)return "Wrong number of arguments for procedure +";
    stringstream ss;
    double sum=strtod(vv[0].c_str(),NULL);
    for(size_t i=1;i<vv.size();i++)sum+=strtod(vv[i].c_str(),NULL);
    ss<<sum;
    return ss.str();
}

The above implementation will allow us to sum an arbitrary number of arguments. Please note that internally, each number is treated as a double. For now we will use only the native C++ numerical types, a future version of the interpreter will allow arbitrary integer size by use of the GMP library.

Using the above defined Environment we can implement a first version of the evaluator, this will act as a simple arithmetic calculator:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
string eval(PList &pp,Environment &env){
    int N=pp.size();
    if(N==1){ //Check for symbol, constant literal, procedure with no argument
        if(pp.elem(0)=="(" && pp.elem(pp.full_size()-1)==")"){
            PList aux=pp.get(0); string inp=aux.elem(0);
            //Check for procedure with no argument, e.g. (quit)
            if(env.find(inp)!=env.end()){
                return env[inp].apply();
            }
            else{
                return(("Error! Unbound variable: "+inp));
            }
        }
        else{
            string inp=pp.elem(0);
            //Check if character
            if(inp[0]=='#' && inp[1]=='\\')return "character type not yet implemented";
            //Check if string
            if(inp[0]=='\"' && inp[inp.size()-1]=='\"')return inp;
            //Check if number
            if(number(inp))return inp;
            //Check if variable or procedure
            if(env.find(inp)!=env.end()){
                if(env[inp].get_kind()=="variable")return env[inp].get_value();
                else{
                    if(show_err1_flag)cout<<env[inp].get_kind()<<" ";
                    show_err1_flag=true;
                    return inp;
                }
            }
            else{
                string res;
                if(show_err2_flag)res="Error! Unbound variable: "+inp;
                show_err2_flag=true;
                return res;
            }
        }
    }
    else{
        PList aux=pp.get(0); string proc=aux.elem(0);
        show_err1_flag=false;
        show_err2_flag=false;
        if     (proc=="quote"){return ((proc+" - not yet implemented!"));}
        else if(proc=="if"){return ((proc+" - not yet implemented!"));}
        else if(proc=="define"){return ((proc+" - not yet implemented!"));}
        else if(proc=="set!"){return ((proc+" - not yet implemented!"));}
        else if(proc==if){return ((proc+" - not yet implemented!"));}
        else if(proc=="lambda"){return ((proc+" - not yet implemented!"));}
        else if(proc=="begin"){return ((proc+" - not yet implemented!"));}
        else{
            PList exps; exps.puts("(");
            for(int i=0;i<N;i++){
                PList piece=pp.get(i);
                string aux=eval(piece,env);
                if(aux=="")aux=(piece.get(0)).elem(0);
                exps.puts(aux);
            }
            exps.puts(")");
            string pr=(exps.get(0)).elem(0);
            vector<string>args;
            for(int i=1;i<exps.size();i++)args.push_back((exps.get(i)).elem(0));
            if(env.find(pr)!=env.end())  return env[pr].apply(args);
            else{
                return(("Error! Unbound variable: "+pr));
            }
        }
    }
}

Let’s see the above code in action:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
schm >>>+
procedure +
schm >>>(+ 1 2)
3
schm >>>(+ 1 3 (- 10 5))
9
schm >>>(+ 0.5 0.786)
1.286
schm >>>(< 2 3)
#t
schm >>>"This is a test."
"This is a test."
schm >>>#\c
character type not yet implemented
schm >>>

In the next article from these series, we will implement the main Scheme special forms: quote, if, set!, define, lambda and begin.

If you want to learn more about Scheme and interpreters in general I would recommend reading Structure and Interpretation of Computer Programs by H. Abelson, G. J. Sussman, J. Sussman:

If you are interested in learning more about the new C++11 syntax I would recommend reading Professional C++ by M. Gregoire, N. A. Solter, S. J. Kleper 2nd edition:

or, if you are a C++ beginner you could read C++ Primer (5th Edition) by S. B. Lippman, J. Lajoie, B. E. Moo.

blog comments powered by Disqus