picol, a Tcl interpreter in 550 lines of C code

After I found this story on programming.reddit.com I was too tempted to write a 500 lines of C code Tcl interpreter for fun, as commented here. It took three hours of work, 556 lines of C code, and was pretty interesting to do. The following is a description of the experiment.

Rules

I had some rule in mind:
  • Unlike the lisp500 interpreter I wanted to use more or less my normal C style. Lisp500 isn't actually 500 lines of code, but 500 lines of human gzipped code ;). In Picol you'll find normal C spacing and even comments.
  • I wanted to write an interpreter with a design similar to a real one. One of the few useful things you can do with picol is to learn how to write a Tcl interpreter if you are a newbie programmer I guess, so the point was to write a simple to understand program.
  • The resulting interpreter should be able to run some kind of non trivial program, to set few vars and print hello world was not an option.

The resulting interpreter: Picol

This is the picol source code, that tells very little without to know the supported features. The parser is very similar to the Tcl one, Picol supports interpolation as well, for example you can write:
set a "pu"
set b {ts}
$a$b "Hello World!"
Note that Picol has an interactive shell! so just launch it without arguments to start to play (to compile just write gcc -O2 -Wall -o picol picol.c).

To run a program stored in a file instead use picol filename.tcl.

Probably the parser should be rewritten in order to take less space, currently it takes almost 250 lines of code: this is too much and leaves little room for all the rest. A Raw list of the supported features:
  • Interpolation, as seen above. You can also write "2+2 = [+ 2 2]" or "My name is: $foobar".
  • Procedures, with return. Like Tcl if return is missing the result of the last command executed is returned.
  • If, if .. else .., while with break and continue
  • Recursion
  • Variables inside procedures are limited in scope like Tcl, i.e. there are real call frames in Picol.
  • The following other commands: set + - * / == != > < >= <= puts
This is an example of programs Picol can run:
proc fib {x} {
    if {<= $x 1} {
        return 1
    } else {
        + [fib [- $x 1]] [fib [- $x 2]]
    }
}

puts [fib 20]
that of course will output fib(20). Another example:
proc square {x} {
    * $x $x
}

set a 1 while {<= $a 10} { if {== $a 5} { puts {Missing five!} set a [+ $a 1] continue } puts "I can compute that $a*$a = [square $a]" set a [+ $a 1] }

Design

It's pretty straightforward, the first important part you see in the source code is an hand written parser. The main function of the parser is picolGetToken that just calls functions able to parse the different parts of a Tcl program and return in the parsing structure the type of the token and start/end pointers in order to extract it.

This function is in turn used by picolEval in order to execute the program. Every token is used either to form a new argument if a separator token was found before, or concatenated to the last argument (this is how interpolation is performed in Picol). Once an EOL (end of line) token is returned picolEval will call the command looking it up in a linked list of commands stored inside the interpreter structure.

Variables and commands substitution is performed by picolEval itself. The parser is able to return variables and commands tokens already stripped by $ and [], so all it's required to do is to lookup the variable in the call frame and substitute the value with the token, or to recursively call picolEval if it's a command substitution, using the result instead of the original token.

Commands are described by a name and a pointer to a C function implementing the command. In the command structure there is also a private data void pointer used in order to store data private to the command. This makes you able to implement multiple Picol commands using a single C function. User defined procedures are just like commands, but they are implemented by passing as private data the argument list and the body of the procedure, so a single C function is able to implement all the existing user defined procedures.

Procedures call is trivial. The interpreter structure contains a call frame structure having more or less just a pointer to a liked list of variables (that are in turn structures with two fileds: name and value). When a procedure is called a new call frame is created and put at the top of the old one. When the procedure returns the top call frame is destroyed.

Next steps

I'll hardly find more time to hack on Picol but with other 100 lines of code it is possible to write [eval], [uplevel] and a minimal support for Tcl lists. This could make the interpreter much more useful, assuming this kind of minimal interpreters have some kind of usefulness at all ;)

Actually what I want to do one of this days is to redesign it to make the parser smaller and simpler to read, and at the same time to make space for uplevel, lindex, lappend, lset and a few of string commands.

Conclusions

Writing interpreters is fun, but trying to write a small one in C makes very clear how much of C code is memory management, pointers exercise in order to parse strings, and implementation of trivial data structures like linked lists for the lack of some kind of dictionaries-alike data structure in the C library. Probably Picol can be rewritten in a dynamic language like Ruby or Tcl itself in 200 lines of code.

Inside every large program there is a small program trying to get out. - Sir Tony Hoare -


Feel free to comment this stuff at reddit.
Pagina creata il Thursday, 15 March 07 | stampa