note
tobyink
<blockquote><p><i>Unfortunately, a good deal of what you read on wikipedia is less than reliable.</i></p></blockquote>
<p>Indeed, but the errors on Wikipedia are not evenly distributed. On a subject such as the birthday attack I'd expect Wikipedia's article to be on par with any other authority (short-lived events of vandalism notwithstanding).</p>
<blockquote><p><i>But the question remains, how to calculate the probabilities of the mechanism.</i></p></blockquote>
<p>The exact calculation involves some big numbers. But assuming that <i>c($x, $y)</i> is the "pick $y from $x" function used in combinatorics, then the probability of a collision for <i>$n</i> strings and an evenly distributed 32-bit hash function should be:</p>
<code>
$p = 1 - ( factorial($n) * c(2**32, $n) / 365**$n )
</code>
<p>Big numbers. Horrible to calculate. Can be approximated though...</p>
<code>
sub e () { 2.718281828 }
my $t = ($n**2) / (2**33);
$p = 1 - ( e ** -$t );
</code>
<p>Calculating $p is still horrible, but calculating $t is easier. If $t is above 20 then $p is 1.00000 when rounded to 6 significant figures.</p>
<p>Thus you can effectively be sure to have a collision with a 32-bit hash function once $t is above 20. You can figure out an $n which triggers $t to be 20 using:</p>
<code>
$n = sqrt(20 * (2 ** 33));
</code>
<p>It's about 414,000. So with 414,000 strings, you are effectively certain to get collision on a 32-bit hash function.</p>
<p>Where I think my reasoning and tye's differ (and tye is almost certainly correct here - blame it on me answering late at night) is that I was then looking at the probabilities that you will have had collisions in all four (or ten) hash functions at the end of the entire run. With even half a million strings, that is a given.</p>
<p>What you're actually doing is looking at events where a single string triggers a simultaneous collision in all the hash functions. I defer to tye's calculations for that.</p>
<!-- Node text goes above. Div tags should contain sig only -->
<div class="pmsig"><div class="pmsig-757127">
<small><small>
<tt>perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
</tt></small></small>
</div></div>
962802
962808