Welcome to the Monastery PerlMonks

### comment on

 Need Help??

Unfortunately, a good deal of what you read on wikipedia is less than reliable.

Indeed, but the errors on Wikipedia are not evenly distributed. On a subject such as the birthday attack I'd expect Wikipedia's article to be on par with any other authority (short-lived events of vandalism notwithstanding).

But the question remains, how to calculate the probabilities of the mechanism.

The exact calculation involves some big numbers. But assuming that c(\$x, \$y) is the "pick \$y from \$x" function used in combinatorics, then the probability of a collision for \$n strings and an evenly distributed 32-bit hash function should be:

```  \$p = 1 - ( factorial(\$n) * c(2**32, \$n) / 365**\$n )

Big numbers. Horrible to calculate. Can be approximated though...

```  sub e () { 2.718281828 }
my \$t = (\$n**2) / (2**33);
\$p = 1 - ( e ** -\$t );

Calculating \$p is still horrible, but calculating \$t is easier. If \$t is above 20 then \$p is 1.00000 when rounded to 6 significant figures.

Thus you can effectively be sure to have a collision with a 32-bit hash function once \$t is above 20. You can figure out an \$n which triggers \$t to be 20 using:

```  \$n = sqrt(20 * (2 ** 33));

It's about 414,000. So with 414,000 strings, you are effectively certain to get collision on a 32-bit hash function.

Where I think my reasoning and tye's differ (and tye is almost certainly correct here - blame it on me answering late at night) is that I was then looking at the probabilities that you will have had collisions in all four (or ten) hash functions at the end of the entire run. With even half a million strings, that is a given.

What you're actually doing is looking at events where a single string triggers a simultaneous collision in all the hash functions. I defer to tye's calculations for that.

perl -E'sub Monkey::do{say\$_,for@_,do{(\$monkey=[caller(0)]->[3])=~s{::}{ }and\$monkey}}"Monkey say"->Monkey::do'

In reply to Re^3: The statistics of hashing. by tobyink
in thread [OT] The statistics of hashing. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

• Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
• Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
• Read Where should I post X? if you're not absolutely sure you're posting in the right place.
• Posts may use any of the Perl Monks Approved HTML tags:
a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
• You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
 For: Use: & & < < > > [ [ ] ]
• Link using PerlMonks shortcuts! What shortcuts can I use for linking?

Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2020-11-24 03:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?

No recent polls found

Notices?