August 18th, 2009
The Achilles’ heel of hashing is collision: When we want to insert a new value into the hash table and the slot is already filled, we use a fallback strategy to find another slot, for example linear probing.
The fallback strategy can affect lookup time since we need to do the same probing when a lookup results in an entry with wrong key, turning the nice O(1) time complexity into (worst case) O(n).
Of course the O(n) time is pessimistic as we will rehash to a larger table size when we reach a certain threshold, though from a theoretical point of view an intriguing approach to handling collisions is cuckoo hashing which guarantees O(1) lookup time (insertion can still be worse).
Quoting the Wikipedia page:
The basic idea is to use two hash functions instead of only one. This provides two possible locations in the hash table for each key.
When a new key is inserted, a greedy algorithm is used: The new key is inserted in one of its two possible locations, “kicking out”, that is, displacing, any key that might already reside in this location. This displaced key is then inserted in its alternative location, again kicking out any key that might reside there, until a vacant position is found, or the procedure enters an infinite loop. In the latter case, the hash table is rebuilt in-place using new hash functions.
This means that no additional probing is required during lookup as an element will always be in one of the two slots given by the hash functions (if it is in the table).
In practice linear probing with proper thresholds and a good hash function may perform better (due to locality of reference) plus insertion using cuckoo hashing can be more expensive (as we do more memory writes on collisions than linear probing), still, I love the theoretical property of this collision strategy :)