The missing opensource software for web development: a scalable database

Wednesday, 09 April 08
With the recent presentation of the latest google service called Google App Engine, it is even more clear that there is a single very important missing opensource solution to write scalable and reliable web applications: a scalable, redundant relational database that is as simple to use as to buy N cheap PCs linked together via LAN running this DB system that makes it easy to add more servers as needed.

To scale the http servers side of a web application is trivial, expecially if you avoid to take state in the server itself avoiding sessions: every web server is just a copy of all the other one with a balancer on top that makes sure the load is dispatched among your http servers.

Instead for the DB back end there is no easy solution: MySQL sucks at this and PostgreSQL is not better AFAIK. They are simply hard to use as a cluster that automagically take care of growing load and data redundancy problems.

We really need a simple to use solution that just let us to add/remove servers as needed, and of course able to handle the failure of some PC in the cluster. Something that allows to remove the server and add a new one, tell the cluster of what is happening and it will resync the new machine (or the fixed one) in background without downtimes.

It does not matter even if it is not a fully features ANSI SQL, something like GQL or even simpler can be enough for most developers, but we really need it in the future because the current LAMP architecture simply does not scale well in a transparent way.

Edit: my friend David Welton pointed out that Mnesia looks interesting but the fact that it's not remotely close to sql, and has some limitations on data size are big problems.

Also it's worth to remember what are the problems with MySQL cluster: the data set can't be larger than the RAM of the PC, not all the nodes are the same, there is a Master that is a single point of failure, and it does not auto-sync in a transparent way when you add servers.
If you like this article vote it on reddit.
Posted at 09:40:23 | permalink | 45 comments | print
Do you like this article?
Subscribe to the RSS feed of this blog or use the newsletter service in order to receive a notification every time there is something of new to read here.

Note: you'll not see this box again if you are a usual reader.


09 Apr 08, 10:55:25
I've just launched a website designed to bring people together to solve this exact kind of problem. Coincidentally, database scalability is a major concern of mine with the site itself. (It uses Postgres.)

I invite you to post your idea as a project on my site: I should note that there is a "cost" for posting projects on the site. But it's not my revenue. You'll see what I mean. jjgignac-att-fossfactory-dott-org

(If you don't post it, I will -- but I'm trying to drum up a userbase.)
Manuzhai writes:
09 Apr 08, 10:59:10
This is exactly why you should be watching the Apache CouchDB project.
Lucian writes:
09 Apr 08, 11:21:49
Mnesia has more in common with sql than say.. using object hierarchies.. You can actually write erlang code that looks almost like sql, and it's code, not string variables/constants used by the main development language, like you have in your average application..
It's just seem alien because you have to wrap your mind around the erlang paradigms.
09 Apr 08, 11:37:29
Also about mnesia my understanding is that it the data size issues have been fixed.
localhost writes:
09 Apr 08, 11:40:44
"a scalable, redundant relational database that is as simple to use as to buy N cheap PCs linked together via LAN running this DB system that makes it easy to add more servers as needed. "

i have a suspicion that if was as simple as you make it sound, then solutions like amazon's/google's and couchdb wouldn't be getting as much press as they do. Relational databases are hard to scale because of the *relations*
Erek Dyskant writes:
09 Apr 08, 11:52:57
You're right that it's definitely needed. Unfortunately transactional databases are inherently nonscalable, as each transaction must be committed to every node before it returns as successful. Notice that amazon's database is nontransactional, and in fact, database writes can take several seconds to propagate through the cluster.

MySQL cluster does support redundant masters, and a master failure doesn't actually bring down the database.

Anyway, there's still going to be a place for failover pairs of 32 core machines with dozens of drives for a while.
Matt S Trout writes:
09 Apr 08, 11:57:01
The trick I've seen for scaling postgres in that architecture is to run it on a multi-machine single system image cluster, which apparently performs pretty reasonably, all things considered.

Not sure that counts as a full solution but interesting nonetheless
some guy writes:
09 Apr 08, 12:03:43
your insistence that this mythical database be relational and support SQL-like querying guarantees that you won't get what you want unless you do it yourself. and we'd all love to watch you try and fail miserably.

databases take *years* to develop. get real.
sapphirecat writes:
09 Apr 08, 12:10:46
Another project to keep an eye on would be Apache Hadoop. It might not turn out to be as good in the long run because it's trying to do two things at once, but it is working in the same problem space.
09 Apr 08, 12:11:21
this is aboslutely silly and a wast of time and I am going to report this to and get these 5 minutes of my life back
NL writes:
09 Apr 08, 12:13:34
The problem is not solely the lack of scalability in the database. The problem is that web development is under-engineered. You have to put more thought into the implementation of those wiz-bang web 2.0 websites before any sort of scalability on the DB level will work. Postgres does fairly well on the scalability part, but it won't matter if your web app and DB design requires you to throw a ton of hardware at it before it will perform reasonably well. Implementing a system that constantly does full table scans will work fine as long as the data set is small. When the data set is large, scaling the system is only going to go so far before you'll have to go back and fix the root cause of the problem (which should have been done in the first place).
Volo Mike writes:
09 Apr 08, 12:20:38
Yeah, I really wish that the SQL creators would not make clustering an afterthought. However, as it is too hard to build up a fresh base into a support movement behind something new, I guess we'll have to do with a system where they bolt on clustering on something old. I've seen the stats of MySQL versus PostgreSQL, and PG wins against MySQL after 100 concurrent users up to the point you need to build a cluster, but MySQL wins the marketing game. At the point you need a cluster, I hear that MySQL makes it easier to do than PostgreSQL. I think since MySQL has Sun money behind it, however, that things will change a lot for it and it might actually start to beat PostgreSQL in all areas.

There's been a lot of website successes lately and exciting entrepreneurship that made a comeback in 2007, and now is doing well in 2008 in the tech market. With so many sites needing to move into high-end database scaling in the FOSS movement, database scalability is indeed a major concern.

There are other ways to scale before needing to cluster a database, however. There are many things you can do, and then there are many things you can do in how you're doing the database statements or have set up the database that can make a tremendous difference. Check out my article:
KirkH writes:
09 Apr 08, 12:21:20
The high performance MySQL book says that IOPS are the main database bottleneck. Solid State Drives 10-100x faster in this regard and Intel is working on fixing the random write problems. If your run of the mill HD will soon do 100k IOPS then a quad core with a bunch of RAM should be able to tackle all but the hugest of loads.

I was on the phone with Yahoo news and they're still using some old P3s, probably because their RAID arrays are the bottleneck.
zack writes:
09 Apr 08, 12:28:50
seconding looking at CouchDB. Get over the relational thing.
steve writes:
09 Apr 08, 12:28:56
"mysql sucks"

I do not agree. MySQL is pretty solid and scalable enterprise level database. Saying that MySQL sucks and not explaining why is just plain dumb.
nicola writes:
09 Apr 08, 12:40:43
postgres is what you need
kraln writes:
09 Apr 08, 12:49:26
So, you want memcached?
antirez writes:
09 Apr 08, 12:52:34
@steve: sorry I was to raw but the idea was "MySQL sucks in this context of being scalable in a transparent way".
antirez writes:
09 Apr 08, 12:53:25
@kraln: memcached is more the drug for this problem... not the real solution.
antirez writes:
09 Apr 08, 12:54:33
@KirkH: yeah but more PCs = more data transfer in parallel, is like RAID0 where two disks will double the preformance. So at least this part should be scalable.
antirez writes:
09 Apr 08, 12:55:21
@Volo Mike: I mostly angree and thanks for your link.
antirez writes:
09 Apr 08, 12:56:57
@someguy: I specified better this part in a reddit comment that I paste here:

I read some comment on my blog and here stating that it is actually a very hard problem, but probably the post was not clear about the feature set that I think reaches the good enough threshold.

I'm not talking about ACID or similar properties, nor JOINs!, the only thing that it is really vital to win the market is IMHO a SQL-alike syntax that can do simple INSERT, UPDATE, DELETE and SELECT with ORDER BY and LIMIT operations.

The only real required feature is that after I alter the database at time T1 I must be sure that a query at time T2 is guaranteed to return the updated value.

Maybe I'm just too optimistic but this does not sound impossible to do even with a fully distributed architecture.
JaredRad writes:
09 Apr 08, 12:57:25
antirez writes:
09 Apr 08, 12:57:57
@Erek Dyskant: I think that most web applications don't need full transactions. Thanks for the information about mysql cluster multiple masters config.
antirez writes:
09 Apr 08, 12:59:03
@Anders Conbere: interesting, so the main problem is fixed. I should look closer to this project as usually the things coming from Erlang are solid and high quality.
antirez writes:
09 Apr 08, 13:00:01
@John-Paul Gignac: interesting indeed, I'm trying to build something this days but my goal is much simpler for now, just a more featured memcached with persistent storage and a type system.
Stan writes:
09 Apr 08, 13:03:11
Tom writes:
09 Apr 08, 13:08:18
@John-Paul Gignac: Reminds me of BountySource:
Mitch writes:
09 Apr 08, 13:16:00
I agree with Kraln. A scalable architecture should include a distributed cache. Scaling out is then a matter of adding more cheap diskless nodes as opposed to adding database nodes which either have costly localized disk subsystems or rely on a SAN which is probably most of your bottleneck in the first place (even more costly).

If your db is slowing down because you're analyzing aggregate data, you should look into an OLAP architecture like Mondrian.
mitch writes:
09 Apr 08, 13:33:39
When people stop thinking about a database as just some dumb storage mechanism, then they will understand how difficult a problem database scalability actually is. Stop looking down your nose at the database world and try to understand something for once.
antirez writes:
09 Apr 08, 13:40:45
@mitch: I agree that's hard but I think that such a database does not exist because there is nothing that is a good compromise between features and scalability. It is possible to write a simple SQL-alike database transactionless but with simple insert/delete/update/select/orderby/limit operations, and the developers can deal with such a DB writing their applications.

So it is probably more a cultural problem than a technological one.
jsnx writes:
09 Apr 08, 14:06:36
If you're serving highly structured, read-most data, then use LDAP. It's not like SQL, but it's definitely cross-language and very flexible. There are a lot of different ways to configure LDAP for scalability and redundancy -- partial slaves, N-way multimaster, referrals -- things that don't exist in SQL.
Noah Slater writes:
09 Apr 08, 14:43:22
You should take a closer look at Apache CouchDB.
Navigatore Anonimo writes:
09 Apr 08, 16:53:24
vahan writes:
09 Apr 08, 20:50:18
check out vertica. column oriented db.
maximuszen writes:
09 Apr 08, 22:58:17
10 Apr 08, 02:00:04
How about sapdb which is now maxdb from mysql
panta writes:
10 Apr 08, 04:21:18
"I'm not talking about ACID or similar properties, nor JOINs!, the only thing that it is really vital to win the market is IMHO a SQL-alike syntax that can do simple INSERT, UPDATE, DELETE and SELECT with ORDER BY and LIMIT operations."

This is a non-relational database. So I wonder why do you need SQL-like syntax (which is only overly-restrictive and uncomfortable wrt a simple API).
But I suspect that very few applications are easily adapted to a non-relational DB, and they would lead to a heavily non-normalized design, or move the "relations" to the application level, making it much more complex. And you would be pushing the scalability and synchronization problems to the application level too.
I've the suspect instead that these days an in-memory database could handle most situations (you could have 64Gb of storage on an intel multi-processor, or scale up to 1Tb RAM on larger setups).
14 Apr 08, 14:52:50
As promised, I've added the project to my site, with a bit of bounty. You can see it at
Elina Yi writes:
24 Sep 08, 02:44:51
Dear Decision Maker,

Memoright, is one of the major high end SSD manufacturer in the world.
Our performance:
Sustain Read >120MB/s
Sustain Write >120MB/s
Random Read @512k >10000
Random Write @512k >800 (outperform rests in the market!)

The SLC-SSD cost is still high for regular individual users.
One of the most easy to be justified application for SLC-SSD is server.
Most server applications need read/write balanced performance, especially in random.
Please find out by yourself the random write IOPS of others SSD, since they never disclose those numbers from right beginning.
Or you just test the IOPS with the setting of 50% read and 50% write. You will ask yourself why pay so much for a one wheel bicycle.
Some other cases, read oriented server, like video server or web server. You should use MLC-SSD instead of one-leg SLC.
Using MemoRihgt ,the most balanced performance SSD in the world, in the server application is your best chose.

Now we are looking for distributor for our leading edge SSD products in your country. Please contact us for business details.

For more information, please visit our website at, and for 3rd party report please visit

Best Regards!
Elina Yi
Skype: elina668
toplist writes:
05 Nov 08, 08:14:28
Alex, fett covered the Circle/Eclipse issue pretty thoroughly, so I'll just add that your extension should be more of a Circle extends Shape, Eclipse extends Shape, if in fact there is a need for sub-classing at all.

The Observer I'll touch on though as it's an interesting pattern with a pretty tough call on when to use it. Observers and Visitors are actually very, very close in implementation, with one exception. The Observer implies that it needs updates on every change of the state of an object, while the Visitor only wants to be invoked on occasion. In the example above, publishing data, that qualifies as a do once event. To be true to the Observer pattern, every time the Event object's location changed, the registered Observer should be notified so it can determine whether it wants to do anything with the current state.

Assuming the ability to work with an Event at each step is desired, the Observer might be the better choice, but in the example above I would stick with the Visitor pattern as it answers a much more specific problem. I'll also venture a guess that a lot of the implementations of the Observer pattern would probably be better stated had they been expressed via a Visitor.

sapphirecat: yup... :-) What makes it even more fun is when you override the method to change functionality so now publishTo() doesn't even call the original publishTo() at all.
Max writes:
08 May 09, 05:36:42
I know this is an old topic, but there are some wrong points about MySQL Cluster:
- Is not limited to RAM (you can choose to have memory or disk tables)
- there is not a Master (all the nodes are active with automatic and transparent fail-over and recovery)
- the "autosync" (means the autopartioning when adding new nodes) is now present in the version 7

So MySQL Cluster fits perfectly your needs, this is why is increasing its footprint in the market so fast.
22 Dec 09, 05:41:15
15 Jan 10, 16:21:46
this information most important for me.
Robert Sullivan writes:
22 Sep 10, 13:41:28
Andres Kutt gave a talk at QCon on how they solved some scalability issues at Skype using PostgreSQL. Seems to help if a few of your developers are also PostrgeSQL contributors ;)
comments closed