Redis and scripting

Wednesday, 27 April 11

Before doing Redis I was completely addicted with one thing: implementing scripting languages. I implemented a number of languages, for instance I wrote three implementations of the Tcl language (one is currently actively used), Scheme interpreters, interpreters for stack based languages similar to FORTH, an interpreter for the Joy language written in Tcl, a macro system for Tcl, and read the source code of Ruby, Python, and many other dynamic language implementations. When I said addicted I meant addicted.

After all I consider Redis a DSL itself... so apart from persistence I'm not really out of my previous business :) So you may wonder how this passion for scripting did not showed into the internals of Redis, that instead is a project very near to the bare metal: C and only C with a focus into efficiency and memory footprint.

The reason is, adding scripting is a big step forward in some way. It means to make everything more dynamic, and I was very very concerned about adding scripting before having a good idea about Redis Cluster: would the scripting capability play well with the cluster? Other problems were related to the idea of defining commands. I don't like the idea of instances with commands defined inside a config file, every Redis instance should be capable of doing everything, without the problem of having different instances with different versions of user defined commands.

For all this reasons I thought at scripting again and again in the latest months... one step after the other I believe I fixed most of the problems I had with scripting, mainly:

What scripting language to use?
What is the semantic of scripting? Should users define commands? Should the command definition be a command itself? How to make sure different commands are in sync in different instances?
What about software engineering? When you read a source code using Redis and you see something like "Redis.myStrangeCommand key1 value1" what do you do? Need to check the instance to see what the newly defined command does? That sucks.
What about Redis Cluster? How scripting and cluster interacts?

Finally I think I've good solutions for all this problems. So I think it is time to start working at a branch implementing scripting. For now just a branch, the experience of our brave users will tell us if the experiment will turn into a real feature or not. But the real question is why scripting?.

There are a few fundamental problems that scripting can fix in a wonderful way:

Scripting makes Redis much faster for some kind of task. Many complex operations that now require some kind of read-compute-write workflow in the client side will just be simple commands that will take a single exchange with the server. And bandwidth is very important... we discovered this talking with guys that are using Redis in big environments.
Most Redis workflow tend to be I/O bound, and not CPU bound. And even when you see the CPU at 100% it is likely all about protocol handling. This is almost impossible to avoid as Redis commands are too much faster than dealing with I/O. Lookup of a key into an hash table, some trivial operation, and so forth. With scripting we can put at much better use our bandwidth and CPU power.
But the fundamental problem is the following: we currently have to either deny features to avoid bloating and leave unsatisfied users, or bloat the server. The problem is, there are many things that you don't want as a command as they are very specific. But this guys actually need this commands, a lot, for their use case. With scripting this problem is completely solved: Redis exports only the general abstractions, what you need 99% of the times. For the 1% use case you write a simple script.

And now... I claimed that the above problems with scripting are solved. How?

Don't define commands

Instead of defining commands in some way we can simply send the script again and again. Redis scripts will be usually super short. People need to do things like: set this only of this key already contains that specific value. Or check all the elements of this sorted set in a given range and return the average value. And so forth. So we can do just:

EVAL "... some script ..." arg1 arg2 arg3 arg4 ...

Redis will try to be smart enough to reuse an interpreter with the command defined. But the point is, this solves a lot of problems in a single step! Now there is no longer the problem of defining commands, instances with different versions of the same command (especially in a cluster scenario), and it is everything evident from the source code of the application.

Specify what arguments are keys

Actually to deal well with cluster, with the experimental "disk backed Redis" things we are doing, and all the future stuff that we could do to make Redis a more interesting product, we only need to know one thing: what of the arguments are keys? To do so we can simply add a new argument to the EVAL command:

EVAL "... some script ..." num_keys arg1 arg2 arg3 ...

Now we know that only the first num_keys arguments are keys, and we can treat EVAL exactly like all the other commands, without to care at all about the semantics of the script executed.

Use a sane language

I think that for what we need Lua beats everybody else hands down. The language is not one that I particularly like, compared to Ruby for instance, but who cares? We are programmers and can code a short script in any language we want, but the point is, Lua is a wonderful implementation. Easy to embed, without even a configure script, like Redis! And FAST.

It's really time to try this into a Redis branch ;) So stay tuned as in the next days I'm sure I'll get up with the right swing to code a first implementation we can collectively play with, to refine our feelings.

You can comment this entry in the Hacker News post

87537 views^*

Posted at 12:25:06 | permalink | 28 comments | print

Do you like this article?
Subscribe to the RSS feed of this blog or use the newsletter service in order to receive a notification every time there is something of new to read here.

Note: you'll not see this box again if you are a usual reader.

Comments

Marc writes:

27 Apr 11, 12:42:24

The Lua integration reminds me of Tokyo Tyrant :)
Love the idea!

Tobias writes:

27 Apr 11, 13:03:09

How about background compilation of LUA-Scripts into plain-c for maximum efficency?
Or using Googles V8 with it's internal compilations, it beats LUA-Performance for miles.

Jonatas Esteves writes:

27 Apr 11, 13:06:38

Just like what JakSprats did with Alchemy Database on top of Redis, but without all the Relational nonsense. I just LOVE this idea! Love the way you plan to implement it and love the choice of programing language too. What's not to love about it!? Kudos for building the best DB ever! I'll definitely try it.

P-A -> @pastjean writes:

27 Apr 11, 13:13:20

if you forget Ruby , js & python , personally Io is little more sane than Lua http://iolanguage.com/

Having a stack based interpreter on a pointer based language is kind of harsh, you always have to rebase the stack.
http://julien.danjou.info/blog/2011.html#Why_not_L...

Why not have pluggable scripting engines ? complexity in implementation?

Tough I like the IDEA!

Jonatas Esteves writes:

27 Apr 11, 13:13:58

@Tobias Lua actually has a JIT compiler which is way faster than V8. It's called LuaJIT by Mike Pall [http://luajit.org/]. But even it's pure interpreter implementation is more than fast enough for this use case.

antirez writes:

27 Apr 11, 14:51:19

The time spend inside Lua will be very small compared to command dispatch I think, but there is something interesting we can do to make scripting trivial to implement and at the same time have decent Lua performances... more soon :)

yesso writes:

27 Apr 11, 15:07:08

Perhaps you could have another data type, "script", in addition to the existing data types. Then you can set a script into a key in the same way that you currently set a string or a hash into a key. And then, instead of specifying the script again and again on every command, you can specify the key of the script that was set into Redis at an earlier time. This saves on bandwidth and allows you to precompile the script.

antirez writes:

27 Apr 11, 15:08:25

@yesso this idea was also expored, but you end with the same problems as with registering scripts. Possibly different instances with different versions of the script, and so forth...

Marcus writes:

27 Apr 11, 15:48:38

On top of a command to send the script, you could also have a command where you send a hash (or some other kind of reference) of the script along with the script itself.

Something like

EVAL_WITH_KEY [hash] [script] [args]

The purpose of this would be to help reduce script parsing / compilation time. When Redis receives the request, it first checks to see if it has the script given by the hash in its compiled form. If that's the case, then it just runs the script. If it doesn't, then it compiles the script and adds it to is tree of compiled scripts.

The key would be defined by the client, so it would need to be consistent for all servers.

You could also have a different command, where instead of sending the script, you just send the hash/key of the script. If the Redis server doesn't have the script pre-compiled, then it sends a request to another server (Redis, HTTP or whatever) to fetch the script (obviously the config for this would need to be set up on each server, but it could include redundant servers).

Something like

EVAL_KEY [hash] [args]

This method has a couple of key advantages:

- you avoid needing to send the script over the network for each request
- you don't have to parse the script code on every request (which you'd need to do if you don't use the hash method above or if you put the script before the arguments)
- it would be consistent across a cluster (so long as you made sure that the hashes/keys of the scripts were different for each script, which would be easy), so you don't have the different copies issues you mention

antirez writes:

27 Apr 11, 15:51:31

@Marcus: I think we can directly hash the script, SHA1 is really fast, millions of times per second, so probably it is better to have a simpler API. Otherwise I can use the *whole* script as a key into the hash table if it is short enough, and use the SHA1 only for larger scripts.

Marcus writes:

27 Apr 11, 16:13:10

The part I'd want to reduce most is the sending of the body of the script over the network each time. I know the hashing will be quick, but if you have long scripts, personally I'd rather fetch them from another server only when needed.

Having two interfaces, one with hashes and one with just the scripts I think would be fairly simple.

Piotr Sikora writes:

27 Apr 11, 17:56:39

Why do people insist on using cryptographic hashes for non-cryptographic purposes? :P

@antirez: Please consider using Murmur, Jenkins, etc. Algorithm is a lot simpler, a lot faster and implementation is usually in the public domain.

antirez writes:

27 Apr 11, 18:01:42

@Piotr: if you use SHA1 or any other crypto-level hash function you can avoid checking at all for the script when you do the hashing. If it matches, it is the same script.

Piotr Sikora writes:

27 Apr 11, 18:12:15

@antirez: I can't say that I agree with you. Same-length hashes (good hashes) should have pretty much the same collision rate regardless of their crypto/non-crypto properties. And you should not assume that same hash == same source (although in this exact case you most likely can).

antirez writes:

27 Apr 11, 18:13:43

@piotr: with a good hash that is collision resistant, and where flipping a single bit in the output will on the average get 50% of probability of every output bit to flip, you can consider same hash the same script, as it is much more probable than everything wrong else happens than a random collision.

Piotr Sikora writes:

27 Apr 11, 18:17:47

@antirez: Hehe, that's true :)

Anyway, good luck with the scripting stuff! Picking hash is probably least of the problems, so I'll end it here.

weepy writes:

27 Apr 11, 19:00:39

out of interest. where does lua beat v8 ?

jaksprats writes:

27 Apr 11, 20:59:12

luajit is faster than V8.
additionally lua embeds itself so well into C, that lua can be called w/o invoking the lua interpreter.
also lua can call C, so lua scripts can call redis' commands at basically C speed.
lua can also load files, w/ function definitions in them and the functions are stored in lua's memory, no need for redis to store functions in a hash, no need to pass function definitions over the wire.

Lua embeds itself very nicely into redis, and luajit2 makes it near post JITed Java fast ....

Ive been playing around w/ lua in redis for 9 months, and it kicks ass, I expect people to love it, and I also expect them to misuse it initially, but thats life

geek42 writes:

28 Apr 11, 00:33:27

how about forth?

Michael writes:

28 Apr 11, 17:23:44

LuaJIT is way faster than V8 and has lower memory footprint. There was some information at "Language shootout" site but they removed V8 & LuaJIT benchmarks :(

Another very important thing is that Lua was created with embedability in mind, and it is really easy to use in cases like this one with Redis.

Pixy Misa writes:

29 Apr 11, 01:09:00

Lua is brilliant, LuaJIT is exceptionally fast, and Redis+Lua will be awesome. :)

I'd definitely be interested in being able to store the script on the server side rather than sending it every time. If we use hashes for this I'd suggest sending a script name/identifier as well as the hash so that users can easily which script they're running.

Ethan writes:

29 Apr 11, 03:37:24

Definitely a good addition to Redis, a thing that has been missing.

How's about to begin with, adding very basic support like test-and-set -- read a hash key, check against a given value, if different set to a new value, else, say touch the TTL or delete. This will help in replacing several RMW operations on redis from a client with a single call. Would be nice if we can have similar crisp commands, which can be quickly released with redis as well.

Matthew Frazier writes:

29 Apr 11, 12:00:23

@Pixy Misa: This would be fairly trivial to do in Lua.

You could load the script with something like `db.set("SCRIPTNAME", string.dump(--[[ define a function in here ]]))` and run it with `assert(loadstring(db.get("SCRIPTNAME")), "script not present")(...)`.

dubek writes:

29 Apr 11, 15:52:12

The problem of storing functions/scripts in database keys is the problem of mixing data and code. Say you choose to revert your database to snapshot from a week ago; now not only your data is week old, but also some of the Lua functions you stored as keys in the database.

In this sense, antirez's idea of sending the entire script each time avoids this problem. But, as people mention, it would start to be cumbersome to use bigger scripts (with functions defined in them and so on).

I'm not sure how UDFs are stored in MySQL - maybe there's an idea there than you can use.

Felix Gallo writes:

29 Apr 11, 18:19:35

Salvatore, I humbly suggest that Lua is a premature optimization to a loosely defined problem domain.

If the problem is a question of expressivity/richness, then I'd suggest that implementing the common relational algebraic and grouping operators in plain Redis command language would go 99.95% of the way there, while maintaining the design paradigm. For the other .05%, let them fork redis and write their own C.

Carl Zulauf writes:

29 Apr 11, 18:28:14

Looking forward to taking advantage of scripting support in Redis. Sounds like you are taking a really well-thought-out approach and I'm just hoping to get my hands on it soon :)

Marco Rogers writes:

30 Apr 11, 17:14:38

@Felix Gallo, it seems to me that if the problem domain is loosely defined, then expanding the redis API with a bunch more commands is the exact wrong approach. If things don't work out, you have to decide what to do with people who are using this expanded API. Or you have to cut them loose. Neither of these is desirable. Adding one command that enables scripting is more future proof as the whole thing is "siloed" and can be easily abandoned or re-imagined if necessary. Embedding a scripting language and monitoring use cases is the best way to explore the problem IMO.

Kader writes:

30 Apr 11, 18:43:07

@antirez you´ve played with scheme/lisp so why not implement a redis lisp dialect?

comments closed

antirez weblog

Redis and scripting

Comments

PROGRAMMING AND WEB

HOT ARTICLES

NEWSLETTER