 Do you know where your variables are? PerlMonks

### comment on

 Need Help??

Wikipedia is correct (well, agrees quite closely with some of my calculations). But you have extrapolated from that quite incorrectly.

0.75**4 is the odds of, after inserting 1.1e5 strings, that there has been at least one collision in each of the four hashes. It is not even close to the odds of there being a single string that produced a simultaneous collision in all 4 hashes.

Those odds would be more like 0.75*(0.75/1.1e5)**3 or about 2e-16. My calculations came up with an upper bound of 9.5e-15.

I believe the 9.5e-15 number is much, much more accurate. It only discounts the possibility of there being fewer than 1.1e5 bits set. I suspect the ratio of "expected bits set"/1.1e5 to be very close to 1.

While the 2e-16 number is based on there being a 75% chance of exactly one collision. But there is a 25% chance of 0 collisions and a 75% chance of 1 or more collisions. And the possibility of 2 collisions approximately doubles the odds for some subset of the possible outcomes and so has a much bigger impact on the final total than does the possibility of there being 1.1e5-1 bits set instead of 1.1e5.

For example, consider tossing a coin twice. There is a 75% chance of getting at least one "head". Do 4 such trials. For each head, assign it a random position in 1.1e5 slots (but only one head per slot per trial). The odds of the 4 trials producing 4 heads in one of those slots (one from each trial) one might estimate as:

```   .75* (.75/x)   * (.75/x)   * (.75/x)

Where x==1.1e5. Or, 2.3e-16. But to find the accurate odds you have to realize that there is a 50% chance of one "head" (per trial) and a 25% change two "heads" (per trial). So the actual odds are found via:

```   .5 * (.5/x)    * (.5/x)    * (.5/x)
+ .5 * (.25*2/x) * (.5/x)    * (.5/x)
+ .5 * (.5/x)    * (.25*2/x) * (.5/x)
+ .5 * (.25*2/x) * (.25*2/x) * (.5/x)
+ .5 * (.5/x)    * (.5/x)    * (.25*2/x)
+ .5 * (.25*2/x) * (.5/x)    * (.25*2/x)
+ .5 * (.5/x)    * (.25*2/x) * (.25*2/x)
+ .5 * (.25*2/x) * (.25*2/x) * (.25*2/x)
+ .25* (.5/x1)   * (.5/x1)   * (.5/x1)
+ .25* (.25*x12) * (.5/x1)   * (.5/x1)
+ .25* (.5/x1)   * (.25*x12) * (.5/x1)
+ .25* (.25*x12) * (.25*x12) * (.5/x1)
+ .25* (.5/x1)   * (.5/x1)   * (.25*x12)
+ .25* (.25*x12) * (.5/x1)   * (.25*x12)
+ .25* (.5/x1)   * (.25*x12) * (.25*x12)
+ .25* (.25*x12) * (.25*x12) * (.25*x12)

Where x1==1.1e5-1 and x12==1/(1.1e5-1)+1/(1.1e5-2). That raises the odds by a factor of 2.37 to 5.6e-16.

That type of magnification of the odds will be much more pronounced in the case originally discussed and I bet would result in nearly the 48-fold increase required to get to the odds of my upper bound calculation.

- tye

In reply to Re^2: The statistics of hashing. (**4) by tye
in thread [OT] The statistics of hashing. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

• Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
• Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
• Read Where should I post X? if you're not absolutely sure you're posting in the right place.
• Posts may use any of the Perl Monks Approved HTML tags:
a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
• You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
 For: Use: & & < < > > [ [ ] ]
• Link using PerlMonks shortcuts! What shortcuts can I use for linking?

Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2021-01-17 22:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
The STEM quote I most wish I'd made is:

Results (176 votes). Check out past polls.

Notices?