With the recent presentation of the latest google service called Google App Engine
, it is even more clear that
there is a single very important missing opensource solution to write
scalable and reliable web applications: a scalable, redundant relational database
that is as simple to use as to buy N cheap PCs linked together via LAN running this DB system that makes it easy to add more servers as needed.
To scale the http servers side of a web application is trivial, expecially
if you avoid to take state in the server itself avoiding sessions: every
web server is just a copy of all the other one with a balancer on top that
makes sure the load is dispatched among your http servers.
Instead for the DB back end there is no easy solution: MySQL sucks at this and PostgreSQL is not better AFAIK. They are simply hard to use as a cluster that automagically take care of growing load and data redundancy problems.
We really need a simple to use solution that just let us to add/remove servers as needed, and of course able to handle the failure of some PC in the cluster. Something that allows to remove the server and add a new one, tell the cluster of what is happening and it will resync the new machine (or the fixed one) in background without downtimes.
It does not matter even if it is not a fully features ANSI SQL, something like GQL or even simpler can be enough for most developers, but we really need it in the future because the current LAMP architecture simply does not scale well in a transparent way.
my friend David Welton
pointed out that Mnesia
looks interesting but the fact that it's not remotely close to sql, and has some limitations on data size are big problems.
Also it's worth to remember what are the problems with MySQL cluster: the data set can't be larger than the RAM of the PC, not all the nodes are the same, there is a Master that is a single point of failure, and it does not auto-sync in a transparent way when you add servers.
If you like this article vote it on reddit