Syntactic Confectionery Delight  
PerlMonks 
Re^3: The statistics of hashing.by tobyink (Canon) 
on Apr 01, 2012 at 09:13 UTC ( #962866=note: print w/replies, xml )  Need Help?? 
Indeed, but the errors on Wikipedia are not evenly distributed. On a subject such as the birthday attack I'd expect Wikipedia's article to be on par with any other authority (shortlived events of vandalism notwithstanding).
The exact calculation involves some big numbers. But assuming that c($x, $y) is the "pick $y from $x" function used in combinatorics, then the probability of a collision for $n strings and an evenly distributed 32bit hash function should be:
Big numbers. Horrible to calculate. Can be approximated though...
Calculating $p is still horrible, but calculating $t is easier. If $t is above 20 then $p is 1.00000 when rounded to 6 significant figures. Thus you can effectively be sure to have a collision with a 32bit hash function once $t is above 20. You can figure out an $n which triggers $t to be 20 using:
It's about 414,000. So with 414,000 strings, you are effectively certain to get collision on a 32bit hash function. Where I think my reasoning and tye's differ (and tye is almost certainly correct here  blame it on me answering late at night) is that I was then looking at the probabilities that you will have had collisions in all four (or ten) hash functions at the end of the entire run. With even half a million strings, that is a given. What you're actually doing is looking at events where a single string triggers a simultaneous collision in all the hash functions. I defer to tye's calculations for that.
perl E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]>[3])=~s{::}{ }and$monkey}}"Monkey say">Monkey::do'
In Section
Seekers of Perl Wisdom

