Redis Sentinel beta released

Monday, 23 July 12
June 5th I started writing the first line of code for Redis Sentinel, and after more or less six weeks I'm happy to release the first public beta. This has been a fun programming sprint where I did something a bit unusual in the Redis development history: to focus on just one aspect of Redis for a number of weeks, almost ignoring everything else was not an important bug report. The reason for following this new methodology was very simple: Redis missed a failover solution in a desperate way, users needed it, VMware encouraged me to go along this path (thanks), so the only possibility to go from zero to something working was to focus on doing just that, enough time to reach a beta that we, as the Redis Community, I'm sure will be able to push forward to production quality in a very short time.

Before to continue better to say where the code is ;) It's merged into the unstable branch at github. Redis Sentinel in fact is a special execution mode of Redis itself.

But you may wonder what Redis Sentinel is exactly. It is a distributed monitoring system for Redis. On top of the monitoring layer it also implements a notification system with a simple to use API, and an automatic failover solution.

Well, this is a pretty cold description of what Redis Sentinel is. Actually it is a system that also tries to make monitoring fun! In short you have this monitoring unit, the Sentinel. The idea is that this monitoring unit is extremely chatty, it speaks the Redis protocol, and you can ask it many things about how it is seeing the Redis instances it is monitoring, what are the attached slaves, what the other Sentinels monitoring the same system and so forth. Sentinel is designed to interact with other programs a lot.

The other idea is that a Sentinel alone can be used, but is often not enough to ensure that you can do things in a reliable way, so instead you take a number of Sentinels placing them across your network infrastructure. One in this computer, one in another, and so forth.

Sentinels are trivial to configure, this is the Redis Way. Point a Sentinel to a master and it will auto-discover the other Sentinels and the attached slaves. Sentinels will agree with other Sentinels if the master should be considered down accordingly to the quorum you required in the configuration, they'll select what Sentinel should perform the failover, if during the failover some other Sentinel should restart it as the previous one appears to be dead, and so forth.

Sentinel is implemented as a state machine in a completely non blocking environment where the monitoring is performed continuously in the background. Then 10 times every second every Sentinel evaluates what it sees, to take decisions.

The design is conceived so that a Sentinel should do its work following a small set of fixed rules, and this rules should be enough to perform the work in a reasonable way, but at the same time the set of rules are easy enough that you can explain them to a newcomer in five minutes, and easy enough that a system administrator can understand what is going to happen during the failover, what exactly can trigger it: easy to understand also means easy to predict.

So that's what we have. Is it perfect? I guess it is not, but it is a very good start in my opinion. And I'm sure that with the help of the Redis Community we'll add what is missing and we'll fix what is not good enough. I've already a TODO list of things that I need to improve that's pretty long, but the current implementation is already something that you can try and even use (not in production environments for the first weeks maybe, just to be safe).

So please join the effort :)

Read the documentation, use the Redis Google Group to comment the design and suggest your ideas, and report the bugs you find in our issues system at Github.

Note: you need to use Sentinel with Redis instances compiled from the latest commit of the 2.4, 2.6 or unstable branch. Redis 2.4.16 will be the first stable release with support for Sentinel.

Also note that there is a known bug in the hiredis library that can make Sentinel crash from time to time, but it's not a problem with Sentinel itself, and we are fixing the library in the next days (the issue happens when SUBSCRIBE returns an error for some reason).

Thank you to everybody that helped in the design process, that showed enthusiasm for this work and encouraged me, and to the great guys at VMware that supported me during this time.

A special thank to Dvir Volk that tested a few previews of Sentinel and provided very useful feedbacks.

More updates in the next weeks with new blog posts. Stay tuned!
58502 views*
Posted at 12:56:26 | permalink | discuss | print