On Redis, Memcached, Speed, Benchmarks and The Toilet

Tuesday, 21 September 10
The internet is full of things: trolls, awesome programming threads, p0rn, proofs of how cool and creative the human beings can be, and of course crappy benchmarks. In this blog post I want to focus my attention to the latter, and at the same time I'll try to show how good methodology to perform a benchmark is supposed to be, at least from my point of view.

Why speed matters?

In the Web Scale era everybody is much more obsessed with scalability than speed. That is, if I've something that is able to do handle a (not better defined) load of L, I want it to handle a load of L*N if I use N instances of this thing. Automagically, Fault tolerantly, Bullet proofly.

Stressing scalability so much put somewhat in shadow what in the 80's was one of the most interesting meters of software coolness, that is: raw speed. I mean old-fashion speed, that is, how many operations is this single thing able to perform? How many triangles can you trace in a second? How many queries can you answer? How many square roots can you compute?

Back in the initial days of home computing this was so important since CPU power was a limited resource. Now computers are way, way faster, but is still speed important, or should we just care about scalability, given that we can handle any load after all just adding more nodes? My bet is that raw speed is very important, and this are a few arguments: This does not mean we don't need scalable systems. Scalability is completely central for the future of Redis. I'll start working at Redis cluster in the next months almost full time. Just the right order is, IMHO, to get a very good single node that is very CPU/memory efficient, and then build a scalable system upon this efficient node.

Systems for the toilet

Apparently I'm not alone in reconsidering the importance of raw speed. The sys/toilet blog published a benchmark comparing the raw speed of Redis and Memcached. The author of this blog firmly believes that many open source programs (and not only) should better go to the toilet instead of running in real systems, since they are not good. I can't agree more with this sentiment. Unfortunately I also think that our dude is part of the problem, since running crappy benchmarks is not going to help our pop culture.

You should check it yourself reading the original post, but this is why the sys/toilet benchmark is ill conceived. Still this benchmark captured my interest. Even with all this problems, why on the earth Redis was showing so low numbers in multi-get (MGET) operations?

I checked the implementation better and found a problem in the query parsing code that resulted in a quadratic algorithm due to a wrong positioned strchr() call. The problem is now fixed in this branch, that is what I'm using to run the benchmarks, and that will be merged in Redis master within days. After this fix large requests are now parsed five times faster than before.

But at this point I started to be curious...

Redis VS Memcached

Even if Redis provides much more features than memcached, including persistence, complex data types, replication, and so forth, it's easy to say that it is an almost strict superset of memcached. There are GET and SET operations, timeouts, a networking layer, check-and-set operations and so forth. This two systems are very, very similar in their essence, so comparing the two is definitely an apple to apple comparison.

Not only this, but memcached is a truly mature software, that undertook a lot of real world stress testing and changes to work well in the practice. It's definitely a good meter to have for Redis. So given the great similarity between the systems if there are huge discrepancies likely there is a bug somewhere, either in Redis or memcached. And this actually was the case with MGET! I fixed a bug thanks to the comparison of numbers with memcached.

Apple to Apple, with the same mouth

But even comparing Apples to Apples is not trivial. You need to really have the very same semantics client side, do sensible things about the testing environment, number of simulated clients and so forth.

In Redis land we already have redis-benchmark that assisted us in the course of the latest one and half years, and worked as a great advertising tool: we were able to show that our system was performing well.

Memcached and Redis protocols are not so different, both are simple ASCII protocols with similar formats. So I used a few hours of my time to port redis-benchmark to the memcached protocol. The result is mc-benchmark.

If you compare redis-benchmark and mc-benchmark with diff, you'll notice that apart from a lot of code removed since memcached does not support many of the operations that redis-benchmark was running against Redis, the only real difference in the source code is the following:
@@ -213,8 +206,15 @@
     char *tail = strstr(p,"\r\n");
     if (tail == NULL)
         return 0;
-    *tail = '\0';
-    *len = atoi(p+1);
+
+    if (p[0] == 'E') { /* END -- key not present */
+        *len = -1;
+    } else {
+        char *x = tail;
+        while (*x != ' ') x--;
+        *tail = '\0';
+        *len = atoi(x)+5; /* We add 5 bytes for the final "END\r\n" */
+    }
     return tail+2-p;
 }
Basically in the memcached protocol the length of the payload was one of the space separated arguments, while in the Redis protocol is at fixed offset (there is just a "$" character before). So there was to add a short while loop (that will scan just two bytes in our actual benchmark), and an explicit very fast test for the "END" condition of the memcached protocol.

Note that mc-benchmark and redis-benchmark are designed to skip the reply so the actual processing is just the absolute minimal required to read the reply. Also note that the few lines of code added to turn redis-benchmark into mc-benchmark are inherently needed for the memcached protocol design. This means that we have an apples to apples comparison, and with the same mouth!.

Methodology

The following is the methodology used for this test: In order to automatically run the tests collecting data I used the following bash script:
#!/bin/bash

# BIN=./redis-benchmark BIN=./mc-benchmark payload=32 iterations=100000 keyspace=100000

for clients in 1 5 10 20 30 40 50 60 70 80 90 100 200 300 do SPEED=0 for dummy in 0 1 2 do S=$($BIN -n $iterations -r $keyspace -d $payload -c $clients | grep 'per second' | tail -1 | cut -f 1 -d'.') if [ $(($S > $SPEED)) != "0" ] then SPEED=$S fi done echo "$clients $SPEED" done
Sorry, the comparison with != "0" is surely lame but I did not remembered the right way to do it with shell scripting :)

Results

Finally... here are the results, in form of a graph generated with gnuplot:

Redis VS memcached

As you can see there are no huge differences in the range of one order of magnitude, but surely Redis is considerably faster in this test. If you check the left side of the graph you'll notice that the speed is about the same with a single client. The more we add simultaneous clients the more Redis starts to go faster.

It's worth to note that I measured the CPU used by both the systems in a 1 million keys GET/SET test, obtaining the following results: As you can see both the systems used the same amount of CPU power. So actually the number of queries per watt are the same. For some reason memcached was not able to exploit all the CPU power available, as the same amount of CPU was used in the course of more real elapsed time to serve the same number of queries.

Atomic operations can be traded for speed

To really understand what is the potential of the two systems as caching systems, there is to take into account another aspect. Redis can run complex atomic operations against hashes, lists, sets and sorted sets at the same speed it can run GET and SET operations (you can check this fact using redis-benchmark). The atomic operations and more advanced data structures allow to do much more in a single operation, avoiding check-and-set operations most of the time. So I think that once we drop the limit of just using GET and SET and uncover the full power of the Redis semantic there is a lot you can do with Redis to run a lot of concurrent users with very little hardware.

Conclusions

This is a limited test that can't be interpreted as Redis is faster. It is surely faster under the tested conditions, but different real world work loads can lead to different results. For sure after this tests I've the impressions that the two systems are well designed, and that in Redis we are not overlooking any obvious optimization.

As a result of this new branch of investigation in the next days I'll create speed.redis.io that will be something like a continuous speed testing of the different branches of Redis, with history. This way we can be sure that our changes will not modify performances, or we can try to enhance performances while checking how the numbers will change after every commit.

Disclaimer

I used a limited amount of time to setup and run this tests. Please if you find any kind of flaw or error drop me a note or a comment in this blog post, so that I can fix it and publish updated results.
193274 views*
Posted at 10:41:03 | permalink | 17 comments | print