Comments for post The missing opensource software for web development: a scalable database

Robert Sullivan writes: Andres Kutt gave a talk at QCon on how they solved some scalability issues at Skype using PostgreSQL. Seems to help if a few of your developers are also PostrgeSQL contributors ;) http://www.infoq.com/articles/learnings-five-years-skype-architect
sepetli vinç writes: this information most important for me. thanks...
karadeniz turlar writes: ridos
Max writes: I know this is an old topic, but there are some wrong points about MySQL Cluster: - Is not limited to RAM (you can choose to have memory or disk tables) - there is not a Master (all the nodes are active with automatic and transparent fail-over and recovery) - the "autosync" (means the autopartioning when adding new nodes) is now present in the version 7 So MySQL Cluster fits perfectly your needs, this is why is increasing its footprint in the market so fast.
toplist writes: Alex, fett covered the Circle/Eclipse issue pretty thoroughly, so I'll just add that your extension should be more of a Circle extends Shape, Eclipse extends Shape, if in fact there is a need for sub-classing at all. The Observer I'll touch on though as it's an interesting pattern with a pretty tough call on when to use it. Observers and Visitors are actually very, very close in implementation, with one exception. The Observer implies that it needs updates on every change of the state of an object, while the Visitor only wants to be invoked on occasion. In the example above, publishing data, that qualifies as a do once event. To be true to the Observer pattern, every time the Event object's location changed, the registered Observer should be notified so it can determine whether it wants to do anything with the current state. Assuming the ability to work with an Event at each step is desired, the Observer might be the better choice, but in the example above I would stick with the Visitor pattern as it answers a much more specific problem. I'll also venture a guess that a lot of the implementations of the Observer pattern would probably be better stated had they been expressed via a Visitor. sapphirecat: yup... :-) What makes it even more fun is when you override the method to change functionality so now publishTo() doesn't even call the original publishTo() at all.
Elina Yi writes: Dear Decision Maker, Memoright, is one of the major high end SSD manufacturer in the world. Our performance: Sustain Read >120MB/s Sustain Write >120MB/s Random Read @512k >10000 Random Write @512k >800 (outperform rests in the market!) The SLC-SSD cost is still high for regular individual users. One of the most easy to be justified application for SLC-SSD is server. Most server applications need read/write balanced performance, especially in random. Please find out by yourself the random write IOPS of others SSD, since they never disclose those numbers from right beginning. Or you just test the IOPS with the setting of 50% read and 50% write. You will ask yourself why pay so much for a one wheel bicycle. Some other cases, read oriented server, like video server or web server. You should use MLC-SSD instead of one-leg SLC. Using MemoRihgt ,the most balanced performance SSD in the world, in the server application is your best chose. Now we are looking for distributor for our leading edge SSD products in your country. Please contact us for business details. For more information, please visit our website at http://www.memoright.com/en/index.asp, and for 3rd party report please visit www.storagesearch.com. Best Regards! Elina Yi MemoRight Email: elina@memoright.com MSN: yatou1818@hotmail.com Skype: elina668
John-Paul Gignac writes: As promised, I've added the project to my site, with a bit of bounty. You can see it at http://www.fossfactory.org/project.php?p=p28
panta writes: "I'm not talking about ACID or similar properties, nor JOINs!, the only thing that it is really vital to win the market is IMHO a SQL-alike syntax that can do simple INSERT, UPDATE, DELETE and SELECT with ORDER BY and LIMIT operations." This is a non-relational database. So I wonder why do you need SQL-like syntax (which is only overly-restrictive and uncomfortable wrt a simple API). But I suspect that very few applications are easily adapted to a non-relational DB, and they would lead to a heavily non-normalized design, or move the "relations" to the application level, making it much more complex. And you would be pushing the scalability and synchronization problems to the application level too. I've the suspect instead that these days an in-memory database could handle most situations (you could have 64Gb of storage on an intel multi-processor, or scale up to 1Tb RAM on larger setups).
ANONYMOUS writes: How about sapdb which is now maxdb from mysql http://en.wikipedia.org/wiki/SAPDB
maximuszen writes: sequoia
vahan writes: check out vertica. column oriented db.
Navigatore Anonimo writes:
Noah Slater writes: You should take a closer look at Apache CouchDB.
jsnx writes: If you're serving highly structured, read-most data, then use LDAP. It's not like SQL, but it's definitely cross-language and very flexible. There are a lot of different ways to configure LDAP for scalability and redundancy -- partial slaves, N-way multimaster, referrals -- things that don't exist in SQL.
antirez writes: @mitch: I agree that's hard but I think that such a database does not exist because there is nothing that is a good compromise between features and scalability. It is possible to write a simple SQL-alike database transactionless but with simple insert/delete/update/select/orderby/limit operations, and the developers can deal with such a DB writing their applications. So it is probably more a cultural problem than a technological one.
mitch writes: When people stop thinking about a database as just some dumb storage mechanism, then they will understand how difficult a problem database scalability actually is. Stop looking down your nose at the database world and try to understand something for once.
Mitch writes: I agree with Kraln. A scalable architecture should include a distributed cache. Scaling out is then a matter of adding more cheap diskless nodes as opposed to adding database nodes which either have costly localized disk subsystems or rely on a SAN which is probably most of your bottleneck in the first place (even more costly). If your db is slowing down because you're analyzing aggregate data, you should look into an OLAP architecture like Mondrian.
Tom writes: @John-Paul Gignac: Reminds me of BountySource: https://www.bountysource.com/
Stan writes: check out hypertable. http://www.hypertable.org/about.html
antirez writes: @John-Paul Gignac: interesting indeed, I'm trying to build something this days but my goal is much simpler for now, just a more featured memcached with persistent storage and a type system.
antirez writes: @Anders Conbere: interesting, so the main problem is fixed. I should look closer to this project as usually the things coming from Erlang are solid and high quality.
antirez writes: @Erek Dyskant: I think that most web applications don't need full transactions. Thanks for the information about mysql cluster multiple masters config.
JaredRad writes: cleversafe.org hypertable.org
antirez writes: @someguy: I specified better this part in a reddit comment that I paste here: I read some comment on my blog and here stating that it is actually a very hard problem, but probably the post was not clear about the feature set that I think reaches the good enough threshold. I'm not talking about ACID or similar properties, nor JOINs!, the only thing that it is really vital to win the market is IMHO a SQL-alike syntax that can do simple INSERT, UPDATE, DELETE and SELECT with ORDER BY and LIMIT operations. The only real required feature is that after I alter the database at time T1 I must be sure that a query at time T2 is guaranteed to return the updated value. Maybe I'm just too optimistic but this does not sound impossible to do even with a fully distributed architecture.
antirez writes: @Volo Mike: I mostly angree and thanks for your link.
antirez writes: @KirkH: yeah but more PCs = more data transfer in parallel, is like RAID0 where two disks will double the preformance. So at least this part should be scalable.
antirez writes: @kraln: memcached is more the drug for this problem... not the real solution.
antirez writes: @steve: sorry I was to raw but the idea was "MySQL sucks in this context of being scalable in a transparent way".
kraln writes: So, you want memcached?
nicola writes: postgres is what you need
steve writes: "mysql sucks" I do not agree. MySQL is pretty solid and scalable enterprise level database. Saying that MySQL sucks and not explaining why is just plain dumb.
zack writes: seconding looking at CouchDB. Get over the relational thing.
KirkH writes: The high performance MySQL book says that IOPS are the main database bottleneck. Solid State Drives 10-100x faster in this regard and Intel is working on fixing the random write problems. If your run of the mill HD will soon do 100k IOPS then a quad core with a bunch of RAM should be able to tackle all but the hugest of loads. I was on the phone with Yahoo news and they're still using some old P3s, probably because their RAID arrays are the bottleneck.
Volo Mike writes: Yeah, I really wish that the SQL creators would not make clustering an afterthought. However, as it is too hard to build up a fresh base into a support movement behind something new, I guess we'll have to do with a system where they bolt on clustering on something old. I've seen the stats of MySQL versus PostgreSQL, and PG wins against MySQL after 100 concurrent users up to the point you need to build a cluster, but MySQL wins the marketing game. At the point you need a cluster, I hear that MySQL makes it easier to do than PostgreSQL. I think since MySQL has Sun money behind it, however, that things will change a lot for it and it might actually start to beat PostgreSQL in all areas. There's been a lot of website successes lately and exciting entrepreneurship that made a comeback in 2007, and now is doing well in 2008 in the tech market. With so many sites needing to move into high-end database scaling in the FOSS movement, database scalability is indeed a major concern. There are other ways to scale before needing to cluster a database, however. There are many things you can do, and then there are many things you can do in how you're doing the database statements or have set up the database that can make a tremendous difference. Check out my article: http://volosystems.com/article.php?a=How%20To%20Scale%20Your%20Web%20App
NL writes: The problem is not solely the lack of scalability in the database. The problem is that web development is under-engineered. You have to put more thought into the implementation of those wiz-bang web 2.0 websites before any sort of scalability on the DB level will work. Postgres does fairly well on the scalability part, but it won't matter if your web app and DB design requires you to throw a ton of hardware at it before it will perform reasonably well. Implementing a system that constantly does full table scans will work fine as long as the data set is small. When the data set is large, scaling the system is only going to go so far before you'll have to go back and fix the root cause of the problem (which should have been done in the first place).
ReparateMe.com writes: this is aboslutely silly and a wast of time and I am going to report this to ReparateMe.com and get these 5 minutes of my life back
sapphirecat writes: Another project to keep an eye on would be Apache Hadoop. It might not turn out to be as good in the long run because it's trying to do two things at once, but it is working in the same problem space.
some guy writes: your insistence that this mythical database be relational and support SQL-like querying guarantees that you won't get what you want unless you do it yourself. and we'd all love to watch you try and fail miserably. databases take *years* to develop. get real.
Matt S Trout writes: The trick I've seen for scaling postgres in that architecture is to run it on a multi-machine single system image cluster, which apparently performs pretty reasonably, all things considered. Not sure that counts as a full solution but interesting nonetheless
Erek Dyskant writes: You're right that it's definitely needed. Unfortunately transactional databases are inherently nonscalable, as each transaction must be committed to every node before it returns as successful. Notice that amazon's database is nontransactional, and in fact, database writes can take several seconds to propagate through the cluster. MySQL cluster does support redundant masters, and a master failure doesn't actually bring down the database. Anyway, there's still going to be a place for failover pairs of 32 core machines with dozens of drives for a while.
localhost writes: "a scalable, redundant relational database that is as simple to use as to buy N cheap PCs linked together via LAN running this DB system that makes it easy to add more servers as needed. " i have a suspicion that if was as simple as you make it sound, then solutions like amazon's/google's and couchdb wouldn't be getting as much press as they do. Relational databases are hard to scale because of the *relations*
Anders Conbere writes: Also about mnesia my understanding is that it the data size issues have been fixed.
Lucian writes: Mnesia has more in common with sql than say.. using object hierarchies.. You can actually write erlang code that looks almost like sql, and it's code, not string variables/constants used by the main development language, like you have in your average application.. It's just seem alien because you have to wrap your mind around the erlang paradigms.
Manuzhai writes: This is exactly why you should be watching the Apache CouchDB project.
John-Paul Gignac writes: I've just launched a website designed to bring people together to solve this exact kind of problem. Coincidentally, database scalability is a major concern of mine with the site itself. (It uses Postgres.) I invite you to post your idea as a project on my site: http://www.fossfactory.org/. I should note that there is a "cost" for posting projects on the site. But it's not my revenue. You'll see what I mean. jjgignac-att-fossfactory-dott-org (If you don't post it, I will -- but I'm trying to drum up a userbase.)
home