Keep It Simple, Stupid  
PerlMonks 
Re^4: [OT] The statistics of hashing.by syphilis (Bishop) 
on Apr 01, 2012 at 19:33 UTC ( #962918=note: print w/replies, xml )  Need Help?? 
I get the odds of having seen a dup after 1e9 inserts as (1  ((4294967295/4294967296)**1e9) ) **10 := 0.00000014949378123 That's not the probability of "having seen a dup", but the probability that the 1000000001st random selection of 10 numbers would be reported as a dup (ie the probability that each of the relevant bits in all 10 bit vectors was already set for that 1000000001st random selection of the 10 numbers). If I get a chance I'll try to work out the probability of "having seen a dup" in the first 1e9 iterations. (But, judging by some of the figures being bandied about, it probably has little bearing on this actual case where we're looking at MD5 hashes instead of random selections.) Cheers, Rob
In Section
Seekers of Perl Wisdom

