Comments for post Backporting into Redis 2.4 and other news
Willp writes: One small improvement in assigning hash buckets would be to round-robin the hash buckets across the nodes in your cluster. Using the sample hash bucket division of hash slots will end up with a pretty uneven distribution of keys.
The distribution is not very uniform for values of crc16() % 4096 on strings that only differ by one or two ascii values. Strings that are generated sequentially will tend to have similar values. If instead the hash slots are initially assigned in round-robin style (give hashslot mod TotalNodes == NodeNumber to node numbered as NodeNumber), then there will be better distribution of keys across nodes. If your keys end in the same string and differ more on the leftmost characters, then the distribution is a lot better. Even better distribution would of course be a random shuffle, though harder to manage/recover. I know, it's still early days for Redis Cluster, and I am hoping for the best!
Python code to demonstrate:
>>> from crc16pure import * # from https://github.com/gennady/pycrc16/blob/master/python2x/crc16/crc16pure.py
>>> for x in range(10): print 'crc16(%s): %d' % ('abc:%d' % x, (crc16xmodem('abc:' + str(x)) % 4096))
...
crc16(abc:0): 2305
crc16(abc:1): 2336
crc16(abc:2): 2371
crc16(abc:3): 2402
crc16(abc:4): 2437
crc16(abc:5): 2468
crc16(abc:6): 2503
crc16(abc:7): 2534
crc16(abc:8): 2057
crc16(abc:9): 2088
>>>
(All but two of these consecutive strings would hash to slots assigned to the middle node on port 6380. CRC16 is cheap but doesn't perturb the bits much on sequential input.)
Round robining the slots (mod 3) in this case would give better (but not great) distribution:
>>> for x in range(10): print 'crc16("%s"): %d and mod 3 is: %d' % ( ('abc:%d' % x), (crc16xmodem('abc:' + str(x)) % 4096), (crc16xmodem('abc:' + str(x)) % 4096) % 3 )
...
crc16("abc:0"): 2305 and mod 3 is: 1
crc16("abc:1"): 2336 and mod 3 is: 2
crc16("abc:2"): 2371 and mod 3 is: 1
crc16("abc:3"): 2402 and mod 3 is: 2
crc16("abc:4"): 2437 and mod 3 is: 1
crc16("abc:5"): 2468 and mod 3 is: 2
crc16("abc:6"): 2503 and mod 3 is: 1
crc16("abc:7"): 2534 and mod 3 is: 2
crc16("abc:8"): 2057 and mod 3 is: 2
crc16("abc:9"): 2088 and mod 3 is: 0
So, 1 slot went to node 0, 4 slots to node 1, and 5 slots to node 2. Better to have (1,4,5) than (0, 8, 2).
Willp writes: Salvatore, thank you! You are doing terrific work! Redis should have been born decades ago.
Dean writes: I am very excited to start testing with Cluster. Thank you for your time working on it, and for posting an update on your progress! Since Hiredis is the "official" client library for Redis, are there plans to evolve it from being a 'naïve' client to a 'full-featured' client as far as Cluster support is concerned? (Referencing previous Cluster terminology where a naïve client requires two round trips for a lookup (using the MOVED response to find the key) and a full-featured client will maintain (and update) a map of keys to hash slots.)
ariso writes: diskstore is really useful for embeding/desktop application. Could you please consider it more high priority?
antirez writes: @Matthew: not in 2.4, probably not even in 3.0 (redis cluster stable release number) as we consider cluster more high priority. Basically diskstore is just an experimental project, it will hit a stable release only if/when we think it rocks. I'm a bit skeptical about mixing Redis and disk as primary storage (not just for persistence) but we'll keep trying new solutions.
Matthew Frazier writes: How is progress on Diskstore coming? Will that be in 2.4? I know there are a lot of people (myself included) who are interested in it.
home