Comments for post Redis weekly update #4 - Fixes and Optimizations
http://www.optimizaresiteseo.net writes: The Virtual machine is difficult to hack if the code is well optimized. I fact,the security must be optimized in the correct way for a maximum of effect against hackers.
Mauricio Fernandez writes: I'd forgotten about RANDOMKEY; if you say O(log N) is not good enough, that
clearly precludes balanced trees.
The method you describe is indeed similar to linear hashing and sounds quite
easy to implement. It seems to me that a linear hashing scheme with
a dynamic array copied incrementally would need less memory and would exhibit
a better access pattern when the underlying array is being expanded, though.
When you use two tables and start to migrate keys (because the load factor is
nearly 1), you'll have 3N buckets (N for the orig + 2N for the new table) for
the time it takes for the migration, whose lower bound is given by the cache
misses. That'd be some 3 per element, assuming you're scanning the buckets and
"forwarding" as you move towards the end of the bucket array (getting the
bucket address is free since the array is examined sequentially, then 1 miss
to get the key pointer, one to address the key, finally another cache miss to
write in the destination table's bucket array, and I'm ignoring collisions in
the 1st order analysis).
That seems consistent with the figures you gave: 200 M elements rehashed in
47s amounts to 235 ns per element (70/80ns mem latency sounds reasonable on
the kind of hardware your user must have been using to hold 160 GB sets in
memory, actual computations take no time compared to the cache misses :). In
other words, you need memory to hold 3N buckets for a a period of time of no
less than N * 235ns (47s in the example you gave).
Now, as for linear hashing: once the initial table with N buckets is full, we
have to expand the underlying dynamic array. Normally, we'd just allocate the
new one (so we have N + 2N buckets now) and copy the old buckets over new new
table. However, in this case we aren't accessing the destination table
randomly, so we're limited by bandwidth, not latency. Migrating a bucket takes
the time needed to copy 1 pointer, instead of several cache misses. For
instance, if the memory bandwidth is 6GB (pretty conservative), that could be
about 0.7 ns, vs. over 200 ns in the previous case, or in other words around
0.13s to copy a table with 200 million buckets, compared to 47s.
Now, we do not have to do it all at once, and need not block for 0.13s: we can
copy incrementally (with some care to prevent collisions, for instance by
adding elements to the old table unless the corresponding buckets in the new
table have already been populated; we can trigger resizing a bit earlier to
avoid high collision rates while we're expanding the hash table).
I believe this all amounts to the following: since the average memory usage is
lower in the latter case, you can use a larger constant to resize the array,
thus lowering the load factor and decreasing the amount of collisions. If we
move the average load from 0.75 to 0.665 (i.e. array size increased by a
factor of 3 on resize, for similar average mem usage of 3N buckets), the expected
number of probes per successful lookup goes from 1.52 to 1.43 (1 + (exp(2*x) -
1 - 2 * x)/(8*x) + (x / 4) where x is the load factor) and GET is nearly 6%
faster :) Also, in the two-table scheme you have to lookup in both when you
get a miss in the first one (for newly added elements), which wouldn't happen
with linear hashing, causing an additional (and potentially larger) difference
in speed. It's going to be very dependent on the usage pattern: if you add
lots of new elements and then access them while you're copying to the 2nd
table, suddenly GET gets twice slower... Then there's also the issue with
deletes (have to operate on both tables) and... oh this is getting tricky :-)
This is becoming quite a Gedanken experiment :)
antirez writes: @Mauricio: thanks for the insight! That's what I think about the two proposed solutions:
- Linear hashing: it's *very* similar to what I'm doing, I would consider my schema a variant actually.
- Trees: we need get-random-element operation so every kind of tree-alike data structure needs to be augmented with additional info per node in order to provide this operation. Also as from time to time we call this operation many times, the O(log(N)) time complexity can't be enough.
I'm implementing my simple schema to check how it works, but I'm confident ;)
Mauricio Fernandez writes: Have you considered using linear hashing?
It's nearly 4am here so please excuse my quoting from wikipedia:
Linear hashing allows for the expansion of the hash table one slot at a time.
The frequent single slot expansion can very effectively control the length of
the collision chain. The cost of hash table expansion is spread out across
each hash table insertion operation, as opposed to be incurred all at once.
Therefore linear hashing is well suited for interactive applications.
In a sense, it pushes the blocking problem to the underlying dynamic array,
but an hybrid between linear hashing and the approach you describe seems
possible (I'm exceedingly tired and probably not thinking clearly, though).
Also, in platforms with memory overcommit, you could just allocate a huge
array and let the OS assign physical memory to your process as the hash table
grows.
As for hash tables vs. e.g. BSTs: I can see your hash table implementation
uses around 4.33 words (for average load factor 0.75) per entry _plus_ the
malloc overhead (often 8 bytes IIRC from Doug Lea's description of his
malloc), that is 6.33 words on average on 32-bit platforms. If you use for
instance a RB tree, you can get it down to 5 words plus malloc overhead,
or even 4 if you encode the color creatively (pointer tagging). With an AVL
it'd be 5 words. BSTs are a bit of a PITA if you don't have a GC at hand, but
there must be zillions of readily usable, robust C implementations around
anyway. Which reminds me that Judy arrays could give you excellent memory
efficiency, performance comparable to hash tables, and, since they are similar
to 256-ary tries, they might have better worst-case bounds on insert (they
might be O(log N), some quick/lazy googling didn't return any complexity bounds). ENEEDSLEEP
Cheers
home