Comnts for post A missing feature in most dynamic languages: take a random key from an hash table.

Comments for post A missing feature in most dynamic languages: take a random key from an hash table.

Willp writes: @Ricardo @Lorenzo The python keys() method is O(N) not O(1), unfortunately. @Paddy3118 python hash order is extremely non-random for certain keys, i.e. small integer values from 0 to 6 come out of a dict completely sorted, so it's not reliable. Try: d = dict() for x in range(6): d[x]=1 print d.keys() # same result here: print list(d.iterkeys()) The output is: [0, 1, 2, 3, 4, 5] This was an intentional design decision for making super-efficient python dictionaries when they are small (up to 8 elements I believe), becuase dicts promise O(1) access, but make no promises about key ordering. To get an O(1) random key out of a dict would require the dict to provide access to elements by their hash bucket position. You could do this by subclassing and maintaining a list[] of keys (updated for all mutating operations) and use that extra list to randomly choose from. Having this as a native method in Redis is pretty cool.

Paddy3118 writes: In Python you can iterate over the keys of a dict in hash order using .iterkeys() and .iteritems . Why not pop the last three items off the hash into a list, examine then discard one item, then add the other two back into the dict. That way you don't have to collect all the keys to the dict, but you do rely on the hash order of the dict keys rather than a random number generator. - Paddy

Ricardo Bánffy writes: Lorenzo, There is an easier way: import random d = {'a':1, 'b':2, 'c':3} k = random.choice(d.keys()) print d.pop(k)

wildwood writes: Is your code assuming at most one key per bucket? This doesn't seem to work if you have more keys than buckets in your hash.

jimrandomh writes: If you need a data structure with unusual extra operations added, you probably need to implement that data structure yourself. This should not be surprising. I don't think I've ever seen a hash table implementation which included get-random-element, nor have I ever had occasion to want one, so this qualifies as an unusual operation. So why are dynamic languages singled out? C++'s std::hash_map doesn't have this operation either.

noah healy writes: What about perl's each function?

antirez writes: @Dave, @Amit: I changed the algorithm in the pseudo code, now every key will be returned with the same probability.

Amit Patel writes: I agree with Dave Hinton that I'd expect random key to be fair, and the O(1) solution presented isn't. Perhaps it can be given a different name. Lorenzo Bolognini and David Teller give O(N) solutions but both are fair. Although I agree this operation may be useful in some situations, I didn't quite follow the example in the blog post. If you want to get the 10 most common sequences, and you pick a random item to drop, you might end up dropping one of the common ones. Instead of dropping them one at a time, drop them all at once. When the hash table size reaches M, copy the K top elements to a new hash table (this takes M*O(K) time). I think the general point is not about *dynamic* languages but about implementations being exposed. Non-dynamic high level languages like OCaml, Haskell, Java, etc., also hide the implementations and make it hard to do what you want. Inheritance is often the way to add operations but you might also look at the way C++'s STL is structured. For example with the priority queue, all the data is exposed, so you can perform additional operations on it. I've used this successfully in an application where I needed a priority queue but also needed a search/reprioritize operation. Had I been using a standard closed implementation that hides all the underlying data, I wouldn't be able to add the operation I needed, and thus I wouldn't be able to reuse the existing priority queue code.

Lorenzo Bolognini writes: Missing but easy enough in Python import random adict = dict(a=1,b=2,c=3) keys = adict.keys() item = keys.pop(random.randrange(0, len(keys))) print adict[item]

David Teller writes: Mmmmhhh.... with OCaml Batteries Included, if you have a hashtable called t, getting a random key would be Random.choice (keys t) or, if you want to get a random ordering of keys, Random.shuffle (keys t) Of course, this is O(n) rather than O(1). But the random choice is fair among keys (assuming that OCaml's PRNG is fair, of course).

Dave Hinton writes: If I was using this feature, I would expect each key to be equally likely to be returned. But this is not the case with your code. e.g. If the hash has 3 buckets and 2 keys, "a" and "b", with "a" stored in bucket 0 and "b" stored in bucket 1, then "a" will be twice as likely to be returned as "b".

web development writes: @web development: I think you missed the point, but from the URL you put in the comment I think it's just spam ;) Sorry if I removed the URL.

web development writes: I would use the rand() function in mysql to get a random row from the table.

home