versione stampabile

One year of Redis

Saturday, 26 December 09

The first version of Redis was released to the public almost one year ago (10 months ago actually). This seems like a good time to look back at the development process.

Redis can't be considered a successful project yet, it's just too early, however the followings are a few concepts I learned in the last months that I feel like to share with you in the hope that I'll be able to apply this ideas in the future, and in the hope that programmers interested in creating a new open source project will find something interesting here and possibly avoid some mistake.

Develop something that you think you'll use for years

Redis is not my first open source project. My main past projects are hping and the Jim interpreter. Still Redis is the first project I'm sure I"ll develop for years, assuming it will be successful, because this time I was wise enough to select something that I want to use myself for the years to come.

I stopped the development of Hping when I quitted security, and I stopped the development of Jim when I quitted Tcl. I'm confident that I'll not quit databases as it's ten years more or less that I'm a MySQL user. When you know you'll need what you are doing for years to come there is a profund vision shift. You think all your efforts are not wasted even if no one will want to use your code, and that likely your users will not be left alone in a few months.

So make a careful choice when starting the development of a new open source project, don't pick something you are interested in today, but something you'll probably be interested in for the next decade.

Early adopters

Early adopters are vital for your project: the mass will not use what you build as long as there isn't a solid initial user base. This sounds like the chicken or egg problem, but actually is not: there are smart guys that will be brave enough to use what you build if it's worthwhile.

Your early adopters are not brave because they are irresponsible, just they can evaluate something without the need to follow the mass. So where should you search for your early adopters? Among the smartest guys around. To post your code on Hacker News once you have something that is complete enough to get an initial feeling is a good idea.

A wonderful side effect of all this is that at least on the initial stages you'll have a terrific community. I love people around Redis, they are in the average incredibly smart and interesting. I enjoy when I provide help via the google group or when I try to fix bugs in very little time because with such a community it's worth the efforts.

Simplicity matters

Users don't like to read zillions of pages of documentation just to get started using your new open source project. They don't like compilation errors, nor complex ideas or protocols.

Your project should be trivial to run, and your documentation should include in the first page instructions about how to try an Hello World usage in a few trivial steps. Once users will have a working hello world they'll be willing to learn more, and read documentation, but not before most of the times.

If the libs you use are using are not included in debian/ubuntu apt-get and/or in mac os x package systems, it's better to include the libs inside the code. Your users should not need more than five minutes to go from the download to the working hello world usage example.

I suspect that the fact that Redis is one of the rare case of NoSQL database that will compile almost everywhere just with make, that will run without a configuration with default settings just with ./redis-server, and that uses a simple enough protocol that you can understand and implement in minutes (so that I could claim many client libs since the first weeks), is playing a very important Role in the relative good adoption Redis is experimenting considering how young it is.

Simplicity also matters in the concepts your users are required to understand to get started. Redis is trivial but I'm always surprised by the number of people that don't get it. I don't even want to think about how hard is for the average user to understand a more complex NoSQL database.

Of course there are also people that told me they don't like Redis because it's too simple to be powerful enough for their use cases. I don't trust this argument, and anyway I think that is a good tradeoff, but be prepared to hear this kind of arguments if you take the simplicity path.

Be conservative about adding features

It's very hard to understand if a user request should or should not be implemented. It's not a matter of development time: I mean that even if the feature request provides a patch, maybe the right thing to do is to not merge.

Every user has his specific needs. They are legitimate from the point of view of the user, but possibly they are not legitimate from the point of view of the project: maybe there are other ways to solve the problem, or the problem in the first instance is a result of a design error the user is doing.

Many times it's just that the feature request is too particular and specific: it's something legitimate but that 1/1000 of users will actually need, but still the feature adds complexity and code to your project. To say no to this feature requests is almost always the right thing to do.

Other times you instead feel like the feature request is ok, general enough, not too hard to implement, and there are no good ways to address the problem in some other way: this may be a good feature to implement, and yet it is a good idea to wait a a few weeks at least, to see if after some time the addition appears to be still good. Basically every non trivial feature should stay in the TODO list some time before to get implemented.

Be pragmatic about your roadmap

Real programmers love to solve hard problems, so it's easy to fall in the trap of implementing what's the most fun to code instead of implementing what's useful. Actually if you love the problem domain, most things will be fun to code in the end, but to get the roadmap wrong is a huge mistake.

For instance I was quite convinced to implement redis-cluster (a layer that gives automatic sharding and fault tolerance among N nodes) as it is a very interesting problem to solve. There is to study new things and possibly design some new algorithm variant that will work well with the Redis semantic and data model. But actually most people in the short time will need much more the ability to use datasets bigger than RAM, that is, the Virtual Memory feature. I changed the plans and I'm going to work on VM in all the first part of 2010, this means that most people will have a very simpler upgrade path once their datasets will be bigger (assuming accesses are not evenly distributed). Even if redis-cluster is nice and will be the next big thing after VM to get inside Redis, this is not as important for most users. To implement one or the other feature before can change the users feeling about your project, so here the rule is, solve problems accordingly to the number of people that will benefit from the new implementations.

Don't expect tons of code that you can actually merge

There is a fable in the open source world, that you get a lot of code once you start to have an user base. This is not how it works: be prepared to write 95% of the code of your project for the first years. There will be somebody that will contribute code actually, but most of the times this code will be about features you don't want to implement, or will not look like sane enough to be merged without a profound review, or will solve a good problem in a way that is not general enough, or simply the coder does not understand enough of the Redis internals or about your future plans to provide an implementation that is acceptable.

From time to time actually it's possible to merge a patch as it is, but this is rare. The idea of "let's implement a solid base so that other programmers will build all the rest" will not ever work.

BSD can be a strength even in the business side

If you are going to develop something that targets not just end users but companies, BSD can be the best pick, as in many business environments it's much more comfortable to use code with a license that allows for internal developments without to deal with distribution of the changes.

Most of the time it's not that this companies don't want to share their changes with the rest of the community, but that this changes are not ready for prime time or well documented, or may show too many things about corporate secrets, and so forth.

The good thing is that, you'll be free to provide a closed source version of your project for instance, even if you accept external patches. This can be a viable business model in many ways: the commercial version can only include things that are marginally useful for most users, but that are important in corporate environments, or may have special features that are too specific to get inside the "real" project but that it's ok to support commercially, and so forth.

Basically the BSD license does not mean that it will be impossible to do business with your project, but your users can be much more comfortable using something that can't experience problems similar to the ones MySQL experienced lately.

44682 views^*

Posted at 09:43:01 | permalink | 17 comments | print