Comments for post Redis weekly update #4 - Fixes and Optimizations

http://www.optimizaresiteseo.net writes: The Virtual machine is difficult to hack if the code is well optimized. I fact,the security must be optimized in the correct way for a maximum of effect against hackers.
Mauricio Fernandez writes: I'd forgotten about RANDOMKEY; if you say O(log N) is not good enough, that clearly precludes balanced trees. The method you describe is indeed similar to linear hashing and sounds quite easy to implement. It seems to me that a linear hashing scheme with a dynamic array copied incrementally would need less memory and would exhibit a better access pattern when the underlying array is being expanded, though. When you use two tables and start to migrate keys (because the load factor is nearly 1), you'll have 3N buckets (N for the orig + 2N for the new table) for the time it takes for the migration, whose lower bound is given by the cache misses. That'd be some 3 per element, assuming you're scanning the buckets and "forwarding" as you move towards the end of the bucket array (getting the bucket address is free since the array is examined sequentially, then 1 miss to get the key pointer, one to address the key, finally another cache miss to write in the destination table's bucket array, and I'm ignoring collisions in the 1st order analysis). That seems consistent with the figures you gave: 200 M elements rehashed in 47s amounts to 235 ns per element (70/80ns mem latency sounds reasonable on the kind of hardware your user must have been using to hold 160 GB sets in memory, actual computations take no time compared to the cache misses :). In other words, you need memory to hold 3N buckets for a a period of time of no less than N * 235ns (47s in the example you gave). Now, as for linear hashing: once the initial table with N buckets is full, we have to expand the underlying dynamic array. Normally, we'd just allocate the new one (so we have N + 2N buckets now) and copy the old buckets over new new table. However, in this case we aren't accessing the destination table randomly, so we're limited by bandwidth, not latency. Migrating a bucket takes the time needed to copy 1 pointer, instead of several cache misses. For instance, if the memory bandwidth is 6GB (pretty conservative), that could be about 0.7 ns, vs. over 200 ns in the previous case, or in other words around 0.13s to copy a table with 200 million buckets, compared to 47s. Now, we do not have to do it all at once, and need not block for 0.13s: we can copy incrementally (with some care to prevent collisions, for instance by adding elements to the old table unless the corresponding buckets in the new table have already been populated; we can trigger resizing a bit earlier to avoid high collision rates while we're expanding the hash table). I believe this all amounts to the following: since the average memory usage is lower in the latter case, you can use a larger constant to resize the array, thus lowering the load factor and decreasing the amount of collisions. If we move the average load from 0.75 to 0.665 (i.e. array size increased by a factor of 3 on resize, for similar average mem usage of 3N buckets), the expected number of probes per successful lookup goes from 1.52 to 1.43 (1 + (exp(2*x) - 1 - 2 * x)/(8*x) + (x / 4) where x is the load factor) and GET is nearly 6% faster :) Also, in the two-table scheme you have to lookup in both when you get a miss in the first one (for newly added elements), which wouldn't happen with linear hashing, causing an additional (and potentially larger) difference in speed. It's going to be very dependent on the usage pattern: if you add lots of new elements and then access them while you're copying to the 2nd table, suddenly GET gets twice slower... Then there's also the issue with deletes (have to operate on both tables) and... oh this is getting tricky :-) This is becoming quite a Gedanken experiment :)
antirez writes: @Mauricio: thanks for the insight! That's what I think about the two proposed solutions: - Linear hashing: it's *very* similar to what I'm doing, I would consider my schema a variant actually. - Trees: we need get-random-element operation so every kind of tree-alike data structure needs to be augmented with additional info per node in order to provide this operation. Also as from time to time we call this operation many times, the O(log(N)) time complexity can't be enough. I'm implementing my simple schema to check how it works, but I'm confident ;)
Mauricio Fernandez writes: Have you considered using linear hashing? It's nearly 4am here so please excuse my quoting from wikipedia: Linear hashing allows for the expansion of the hash table one slot at a time. The frequent single slot expansion can very effectively control the length of the collision chain. The cost of hash table expansion is spread out across each hash table insertion operation, as opposed to be incurred all at once. Therefore linear hashing is well suited for interactive applications. In a sense, it pushes the blocking problem to the underlying dynamic array, but an hybrid between linear hashing and the approach you describe seems possible (I'm exceedingly tired and probably not thinking clearly, though). Also, in platforms with memory overcommit, you could just allocate a huge array and let the OS assign physical memory to your process as the hash table grows. As for hash tables vs. e.g. BSTs: I can see your hash table implementation uses around 4.33 words (for average load factor 0.75) per entry _plus_ the malloc overhead (often 8 bytes IIRC from Doug Lea's description of his malloc), that is 6.33 words on average on 32-bit platforms. If you use for instance a RB tree, you can get it down to 5 words plus malloc overhead, or even 4 if you encode the color creatively (pointer tagging). With an AVL it'd be 5 words. BSTs are a bit of a PITA if you don't have a GC at hand, but there must be zillions of readily usable, robust C implementations around anyway. Which reminds me that Judy arrays could give you excellent memory efficiency, performance comparable to hash tables, and, since they are similar to 256-ary tries, they might have better worst-case bounds on insert (they might be O(log N), some quick/lazy googling didn't return any complexity bounds). ENEEDSLEEP Cheers
home