Testing the new Redis AOF rewrite

Tuesday, 13 December 11
Redis 2.4 introduced variadic versions of many Redis commands, including SADD, ZADD, LPUSH, RPUSH. HMSET was already available in Redis 2.2. So for every Redis data type, we know have a way to add multiple items in a single command, for example: However this feature was not used when rewriting the AOF log (operation now performed automatically since Redis 2.4, but that the user can still trigger using the BGREWRITEAOF, even if the server is not configured to use AOF).

The AOF was still generated using a single command for every element inside an aggregate data type. For instance a three elements list required three different calls to LPUSH in the rewritten AOF file.

Finally Redis 2.6 (that will be forked from the current unstable branch, just removing the cluster code) is introducing the use of variadic commands for AOF log rewriting. The result is that both rewriting and loading an AOF file containing aggregate types and not just plain key->string pairs will be much faster.

How much faster?

We'll start checking the speed gain that can be obtained in a real world dataset with very few keys containing aggregate data types, that is, the database of lloogg.com. Since lloogg was designed in the early stage of Redis development where Hashes where still not available, it stores a lot of user counters as separated keys, so there are a lot of keys just containing a string (huge waste of memory, but I've still to find the time to modify the code). However there is around a 5% of sorted sets. This is an excerpt from the full output of Redis Sampler against the lloogg live DB.
 string: 95480 (95.48%)   zset: 4469 (4.47%)       list: 48 (0.05%)        
 set: 3 (0.00%)
As you can see this is far from the ideal dataset to make the new AOF changes to look cool, still the result is significant: Now let's check the loading time of all the three options: I think this is very good news if you consider this database contained just a small number of keys. Now what happens for users that have a lot of lists, hashes, sets, sorted sets?

Bigger gains

To test the new code with a database that better represents an use case where most of the keys are aggregate values I created a dataset with 1 million of hashes containing 16 fiels each. Fields are reasonably sized, like Field1: Value1, Field2: Value2, and so forth.

I used this Lua script to create the dataset:
local i, j
for i=1,1000000 do
    for j=1,16 do
return {ok="DONE"}
(Note, if you use the latest unstable branch you can run it using: redis-cli --eval /tmp/script.lua)

Now the same metrics as above but against this new dataset: Now let's check the loading time of all the three options: As you can see now both the AOF rewriting and loading time is reduced to almost an half of the time required with Redis 2.4. However you can still see an amazing 1.888 seconds in the time needed to load the RDB. Why?

Because since Redis 2.4 BGSAVE directly outputs the encoded version of the value, if the value is encoded as a ziplist, an intset or a zipmap. This is a huge advantage, both while loading and saving the database, that could be easily implemented in the AOF rewrite. However I'm currently not doing it as probably in the next versions of Redis we'll have an option to rewrite the AOF log in RDB format itself... so with the unification of the two systems a lot of problems will be reduced.
Posted at 10:55:04 | permalink | discuss | print