Comments for post Diskstore and a new b-tree implementation
http://www.test.ro writes: Another test
[url=http://www.test.ro]test[/url] writes: This is just a test !
rch writes: disco:ddfs?
Matt writes: Any code, yet?
Tim writes: both diskstore-fs and diskstore-btree are certainly useful, but i think the most important feature would be a generic diskstore interface (pipes? sockets? http? zeromq?), so that any backend store can be used.
there are a lot of exciting options with different tradeoffs: berkeleydb, riak, cassandra, tokyo cabinet, membase, mogilefs ....
eric writes: please hurry..... my rewrite log is like 40 gigs and I can't get it to rewrite in a normal amount of time.
Benjamin Sergeant writes: For the on diskstore-fs, I think there's a log(N) access cost for accessing a file in a folder with N entries, so it's a good idea to nest the files (dont know what's the good value to choose for max_n though).
Anonymous writes: You mentioned that your fs-backed implementation allows rsyncing the files to another server. If I use the B-tree backend instead, how should I keep the data backed up in a consistent state?
tobi writes: why not use berkleydb?
antirez writes: @saurik: good question. Basically during this implementation I'm also trying to learn on disk data structures. This is what I'm doing so far, in a sort of incremental implement/understand way:
I'm starting with a pretty general on disk allocator that is able to allocate and reclaim, without any compaction stage (using on disk free lists) space in power of two sizes.
On top of this as a first attempt I'm creating a tree with a number of nodes between 4 and 255, with the same basic properties of btrees but implemented in a different way as values are always stored outside the node, keys are fixed length (first 16 byte of SHA1) as I need to support keys of any length even if I can't access them ordered.
At this stage this is just experimenting... then I'll try to apply modifications to enhance performances and corruption resistance.
I'll check the data structures you mention to see what advantages they have, and I hope I'll find something that can better solve my problems, that are: not the fastest but decently fast disk KV store, good resistance to corruption, ability to update N keys transactionally so that either all the new values or neither will get in, no need for compaction, no need for parallel accesses.
startupgrrl writes: You might as well implement 2-COLA (a cache oblivious streaming B-Tree) if you're going to go that route. B-Trees alone are archaic.
Jay Freeman (saurik) writes: Why a B-tree? To be clear what I mean: as opposed to LSM-trees (like BigTable and Cassandra), B-tries (note the "i" in there), whatever else.
home