http://qs321.pair.com?node_id=962865


in reply to Re^2: [OT] The statistics of hashing.
in thread [OT] The statistics of hashing.

Thanks syphilis. Your calculations make sense to me. But I'm not sure that it gels with the actual data?

Assuming I've coded your formula correctly (maybe not!), then using 10 hashes & vectors, I get the odds of having seen a dup after 1e9 inserts as (1 - ((4294967295/4294967296)**1e9) ) **10 := 0.00000014949378123.

By that point I had actually seen 13 collisions:

And looking at the figure for 4e9 := 0.00667569553892502, by which time the 10 vectors will be almost fully populated, it looks way too low to me?

I would have expected that calculation (for N=4e9) to have yielded odds of almost 1?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?