|Pathologically Eclectic Rubbish Lister|
Re^2: The statistics of hashing.by BrowserUk (Pope)
|on Mar 31, 2012 at 23:26 UTC||Need Help??|
Seems almost certain to me,
That's a given. As with bloom filters, false positives are a fact of life. The question is how likely, or more to the point, how frequently, they will occur.
According to Wikipedia for a 32-bit hash you only need about 110,000 hashes to get a 75% chance of a collision.
Unfortunately, a good deal of what you read on wikipedia is less than reliable.
The following is empirical evidence from an on-going experiment.
After 3/4 of a billion trials, there were zero (possible) dups. After 1.2 billion, there are only 61.
These are the positions at which those possible dups were detected:
As you can see, having filled 1/4 quarter of the total space available, false positives are just beginning to occur. And as the vector space fills, they are (as expected) becoming more frequent. (Of course, some of those may be actual duplicates. It will take a second run to determine that.)
But the question remains, how to calculate the probabilities of the mechanism.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.