Redis and scripting
Wednesday, 27 April 11
Before doing Redis I was completely addicted with one thing: implementing scripting languages.
I implemented a number of languages, for instance I wrote three implementations of the Tcl language (one is currently actively used), Scheme interpreters, interpreters for stack based languages similar to FORTH, an interpreter for the Joy language written in Tcl, a macro system for Tcl, and read the source code of Ruby, Python, and many other dynamic language implementations. When I said addicted I meant addicted.
After all I consider Redis a DSL itself... so apart from persistence I'm not really out of my previous business :) So you may wonder how this passion for scripting did not showed into the internals of Redis, that instead is a project very near to the bare metal: C and only C with a focus into efficiency and memory footprint.
The reason is, adding scripting is a big step forward in some way. It means to make everything more dynamic, and I was very very concerned about adding scripting before having a good idea about Redis Cluster: would the scripting capability play well with the cluster? Other problems were related to the idea of defining commands. I don't like the idea of instances with commands defined inside a config file, every Redis instance should be capable of doing everything, without the problem of having different instances with different versions of user defined commands.
For all this reasons I thought at scripting again and again in the latest months... one step after the other I believe I fixed most of the problems I had with scripting, mainly:
- What scripting language to use?
- What is the semantic of scripting? Should users define commands? Should the command definition be a command itself? How to make sure different commands are in sync in different instances?
- What about software engineering? When you read a source code using Redis and you see something like "Redis.myStrangeCommand key1 value1" what do you do? Need to check the instance to see what the newly defined command does? That sucks.
- What about Redis Cluster? How scripting and cluster interacts?
Finally I think I've good solutions for all this problems. So I think it is time to start working at a branch implementing scripting. For now just a branch, the experience of our brave users will tell us if the experiment will turn into a real feature or not. But the real question is why scripting?.
There are a few fundamental problems that scripting can fix in a wonderful way:
- Scripting makes Redis much faster for some kind of task. Many complex operations that now require some kind of read-compute-write workflow in the client side will just be simple commands that will take a single exchange with the server. And bandwidth is very important... we discovered this talking with guys that are using Redis in big environments.
- Most Redis workflow tend to be I/O bound, and not CPU bound. And even when you see the CPU at 100% it is likely all about protocol handling. This is almost impossible to avoid as Redis commands are too much faster than dealing with I/O. Lookup of a key into an hash table, some trivial operation, and so forth. With scripting we can put at much better use our bandwidth and CPU power.
- But the fundamental problem is the following: we currently have to either deny features to avoid bloating and leave unsatisfied users, or bloat the server. The problem is, there are many things that you don't want as a command as they are very specific. But this guys actually need this commands, a lot, for their use case. With scripting this problem is completely solved: Redis exports only the general abstractions, what you need 99% of the times. For the 1% use case you write a simple script.
And now... I claimed that the above problems with scripting are solved. How?
Don't define commands
Instead of defining commands in some way we can simply send the script again and again. Redis scripts will be usually super short. People need to do things like: set this only of this key already contains that specific value. Or check all the elements of this sorted set in a given range and return the average value. And so forth. So we can do just:
EVAL "... some script ..." arg1 arg2 arg3 arg4 ...
Redis will try to be smart enough to reuse an interpreter with the command defined. But the point is, this solves a lot of problems in a single step! Now there is no longer the problem of defining commands, instances with different versions of the same command (especially in a cluster scenario), and it is everything evident from the source code of the application.
Specify what arguments are keys
Actually to deal well with cluster, with the experimental "disk backed Redis" things we are doing, and all the future stuff that we could do to make Redis a more interesting product, we only need to know one thing: what of the arguments are keys? To do so we can simply add a new argument to the EVAL command:
EVAL "... some script ..." num_keys arg1 arg2 arg3 ...
Now we know that only the first num_keys arguments are keys, and we can treat EVAL exactly like all the other commands, without to care at all about the semantics of the script executed.
Use a sane language
I think that for what we need Lua beats everybody else hands down. The language is not one that I particularly like, compared to Ruby for instance, but who cares? We are programmers and can code a short script in any language we want, but the point is, Lua is a wonderful implementation. Easy to embed, without even a configure script, like Redis! And FAST.
It's really time to try this into a Redis branch ;) So stay tuned as in the next days I'm sure I'll get up with the right swing to code a first implementation we can collectively play with, to refine our feelings.
You can comment this entry in the Hacker News post