Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re^4: The statistics of hashing.

by BrowserUk (Pope)
on Apr 01, 2012 at 09:34 UTC ( #962869=note: print w/replies, xml ) Need Help??

in reply to Re^3: The statistics of hashing.
in thread [OT] The statistics of hashing.

Indeed, but the errors on Wikipedia are not evenly distributed. On a subject such as the birthday attack I'd expect Wikipedia's article to be on par with any other authority

I think the first mistake that you (and Tye) are making here, is assuming that "the birthday paradox" (as applied to the each of the 32-bit hashes individually) is the controlling factor. It isn't.

The controlling factor is the combinatorial affect of the 4 (or, in my case 10) separate hashes & vectors.

I think that applying the birthday paradox to the 128-bit hash may prove to be an accurate upper bound. But it is so large relative to the limits of the 32-bit vectors as to never actually come into play.

I had tried applying various formulae I found on wikipedia and elsewhere, but found they do not match with reality. Hence, this post.

You might also look at the math described for Bloom filters. It also contradicts these naive applications of the birthday paradox, but is sufficiently well researched and documented (sources outside of wikipedia) to show that naive application of the birthday paradox is not applicable when combining multiple hashes in this way.

The problem with the Bloom filter math is that it is based upon setting the bits derived from all the different hashes into the same vector. Ie. Each Bloom filter insert sets N bits in a single vector, where N is the number of different hashes being used. This obviously causes the vector to fill up much more quickly than my mechanism which uses a separate vector for each hash.

My gut feel, is that as my vectors fill up N times more slowly, the probabilities of collisions for my mechanism are 2**N times lower than with a Bloom filter (at the expense of N*the memory requirement). This is the notion I am trying to verify here.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://962869]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (1)
As of 2021-09-26 21:56 GMT
Find Nodes?
    Voting Booth?

    No recent polls found