Just another Perl shrine | |
PerlMonks |
Re^3: [OT] The statistics of hashing.by BrowserUk (Patriarch) |
on Apr 01, 2012 at 09:04 UTC ( [id://962865]=note: print w/replies, xml ) | Need Help?? |
Thanks syphilis. Your calculations make sense to me. But I'm not sure that it gels with the actual data? Assuming I've coded your formula correctly (maybe not!), then using 10 hashes & vectors, I get the odds of having seen a dup after 1e9 inserts as (1 - ((4294967295/4294967296)**1e9) ) **10 := 0.00000014949378123. By that point I had actually seen 13 collisions: <Reveal this spoiler or all in this thread>
And looking at the figure for 4e9 := 0.00667569553892502, by which time the 10 vectors will be almost fully populated, it looks way too low to me? I would have expected that calculation (for N=4e9) to have yielded odds of almost 1? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
In Section
Seekers of Perl Wisdom
|
|