An update on the Memcached/Redis benchmark

Thursday, 23 September 10
A few days ago I published a blog post that included a benchmark between Redis and memcached.

My results showed Redis to be considerably faster than memcached running a single instance of redis-benchmark (or the equivalent mc-benchmark) against an instance of Redis using a single core, and an instance of Memcached using four cores.

I was missing something in the test, Dormando published a counter benchmark where more than a single mc-benchmark was used against memcached, showing that only under this conditions memcached is able to saturate all the CPU cores, leading to much higher numbers.

The test performed by @dormando was missing an interesting benchmark, that is, given that Redis is single threaded, what happens if I run an instance of Redis per core? Also now that we have this new results, what is the reason why memcached was slower against a single instance of mc-benchmark? We have the proof that the benchmark is able to perform more operations per second even with a single instance, since Redis was showing higher numbers, so given that memcached appears to do a great job with multiple instances of the benchmark, what was happening?

Redis and memcached trying to saturate two cores

My first attempt will be to use memcached started with "-t 2" to use just two cores, and two instances of Redis, against two instances of the benchmark. I limited both memcached and Redis to two cores because my box is a quad core so I can't run four instances of the server and four of the benchmark making sure that every thread will have a core.

What are the results? I'll not show any graphs this time, but just what happens with 100 clients as the results are more or less consistent with different number of clients: Redis appears to be able to scale horizontally per core without issues both in SET and GET operations, since every core is running an isolated process. Memcached obviously can't scale so well because it is a single instance using multiple cores, this a tradeoff that will be discussed later.

As I started to be curious I launched memcached with different number of threads, and discovered something very interesting. When running with a single thread, memcached is able to perform much better against a single instance benchmark... so what happens if we run two memcached instances instead of a threaded one? Exactly like Redis.

Threaded or not

Ok a few observations... the first is that with many kind of workloads you may better run memcached with "-t 1", even if you have more cores. Second, a process per thread is the most scalable solution apparently, and this should not be a big surprise at all.

So now the big question is: "is a multi threaded implementation worth it?".

This is a matter of design, tastes, and facts all mixed together. With a single instance using multiple threads you have a few advantages: There are also disadvantages:

We decided to go against a threaded approach for the following reasons:

I really have zero doubts about this: in a future of cloud computing I want to consider every single core as a computer itself. It's hard to allow for a lot more complexity for something you can obtain with a slightly better client library (your client can take connections against all the instances and provide multi get for you, via multiplexing). Add to this that the CPU/memory overhead for every added instance is near to zero, for both memcached and Redis.

The End

I hope the combined efforts of my benchmark and the dormando one had the effect of shedding some light on the matter. With current entry level hardware the ceil appears to be 100,000 operations per core, and there is a tradeoff between threaded and non threaded implementations.

I think we need more tests like this in general (and less busy-loops...) but not to tell "I've it longer" but to show what are the real world performances of different systems and why, as with well designed systems you can be sure it's almost always a tradeoff and not some lame programming error. In the case there are large discrepancies and there is indeed some programming problem, it's possible to investigate and fix such problems.
Posted at 05:24:41 | permalink | 7 comments | print