https://sigpipe.macromates.com/2009/cuckoo-hashing/feed/Comments on: Cuckoo Hashing2014-08-14T16:00:13+00:00https://sigpipe.macromates.com/2009/cuckoo-hashing/#comment-25727Comment by Joseph Lenox2014-08-14T16:00:13+00:002014-08-14T16:00:13+00:00Joseph Lenoxhttps://github.com/lordofhyphens/cuckoo-sig<p>Sorry for the extra post -- forgot to add a CUDA implementation.</p>
https://sigpipe.macromates.com/2009/cuckoo-hashing/#comment-25726Comment by Joseph Lenox2014-08-14T15:58:43+00:002014-08-14T15:58:43+00:00Joseph Lenox<p>One of the uses for a cuckoo-hash table as opposed to linear probing is that (especially with a stash), you can get good performance in a heavily multi-threaded environment that supports atomic exchange.</p>
<p>References:
http://dl.acm.org/citation.cfm?id=1618500
http://gradworks.umi.com/34/82/3482095.html (dissertation)
And Chapter 4 of "GPU Computing Gems - Jade Edition"</p>
https://sigpipe.macromates.com/2009/cuckoo-hashing/#comment-25644Comment by Allan Odgaard2009-08-22T07:07:13+00:002009-08-22T07:07:13+00:00Allan Odgaardhttp://macromates.com<p>Martin: I think I get your point, but also, I think you may have read my post rather lightly ;)</p>
<p>Hash algorithms have <em>expected</em> <em>O(1)</em> time for insert and lookup and <em>worst-case</em> <em>O(n)</em>. You can make a hybrid (like you propose) and get that <em>worst-case</em> down to <em>O(lg n)</em> — though this a) is then no longer a pure hash algorithm, and b) still does not <em>guarantee</em> <em>O(1)</em> insert and lookup.</p>
<p>Cuckoo hashing <em>does</em> guarantee <em>O(1)</em> lookup time and still have expected <em>O(1)</em> insert time (with <em>O(n)</em> as worst-case).</p>
<p>This does <em>not</em> mean that cuckoo hashing is a better choice (in practice), it is a <em>theoretical</em> improvement which I found interesting and motivated me to write this post! It was not a plug to sell cuckoo hashing, as I already say in the post: linear probing (with proper threshold) is likely a better choice both for insert and lookup.</p>
<p>Hope that makes my article’s point more clear.</p>
https://sigpipe.macromates.com/2009/cuckoo-hashing/#comment-25643Comment by Martin Pfeiffer2009-08-21T20:27:50+00:002009-08-21T20:27:50+00:00Martin Pfeiffer<p>Sure you get collisions, that is the problem of every hash strategy. But the question is how long will the chain be if you get the collision. To get the best worst case runtime at hashing it is a good way to use red black trees to manage buckets, cause they have a O(log n) for insert and delete in worst case. So if you have worst case scenario for hashing (e.g. you get 1 as hash result for every item you insert in the table) you can still go over the chain in O(log n) and in a realistic scenario with a good hash function you get O(1) for every operation. This realistic scenario is much more likely cause you dont have only a million buckets, you have 2^n buckets for a n-bit hash function. So with a good hash function you should get amazing result. Sure this method isnt easy to implement cause insert and delete for red black trees is not trivial and some sort of too much for just hashing put from the theoretic point of view it should be the optimum in run time.</p>
<p>The problem of cuckoo hashing is that it depend on the load factor of the table. If the load factor go over 50 % the insert is not longer at O(1) it is at O(n) in worst case. The problem is that if you have a very large table (e.g. 10 million keys) and must rebuild the table, then this takes time... Lets say you use the hash table in a web application for storing some data, then maybe 1 million users get there results very fast, but the next ones causes the table to rebuild, and the web application is freezing for seconds or minutes.</p>
<p>So to come to a point:
With cucko hashing you go very fast if everything is ok, but from time to time there is a very large rebuilt of the table.
Bucket hashing is maybe slighty slower (dont know it exactly) but its very constant in time consuming over the time.</p>
<p>So it is as with every algorithm or data structure: It depend on in which situation you want to use it and the theoretical point of you always looks at infinity and I dont think you want to built a hash table at this size :></p>
<p>I hope you get my point of view in my horrible english and maybe found it interesting or useful. So have a nice weekend and keep on you good work.</p>
https://sigpipe.macromates.com/2009/cuckoo-hashing/#comment-25642Comment by Allan Odgaard2009-08-19T16:39:48+00:002009-08-19T16:39:48+00:00Allan Odgaardhttp://macromates.com<p>Martin: I am not sure what <em>“bucket hashing”</em> refers to, but a good implementation of pretty much all the schemes will dynamically resize the table, despite that (<a href="http://en.wikipedia.org/wiki/Hash_table#Collision_resolution" rel="nofollow">quoting Wikipedia</a>):</p>
<blockquote>
<p>[…] if 2500 keys are hashed into a million buckets, even with a perfectly uniform random distribution, according to the <a href="http://en.wikipedia.org/wiki/Birthday_paradox" rel="nofollow">birthday paradox</a> there is a 95% chance of at least two of the keys being hashed to the same slot</p>
</blockquote>
<p>So you will still get collisions.</p>
<p>As for what is the best strategy, the <a href="http://rubini.us/" rel="nofollow">Rubinius</a> contributors did experiments and <a href="http://github.com/evanphx/rubinius/blob/a44e405af26dd7fcc011fb64e2c8581e33681ae3/doc/experiments/hash/results.txt" rel="nofollow">documented their results</a>.</p>
https://sigpipe.macromates.com/2009/cuckoo-hashing/#comment-25641Comment by Martin Pfeiffer2009-08-19T16:12:14+00:002009-08-19T16:12:14+00:00Martin Pfeiffer<p>Use bucket hashing instead, it is one of the best strategies for hashing cause it uses dynamic resizing of the table (extending the table cause of collition in O(1)).</p>