Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
If I get a chance I'll try to work out the probability of "having seen a dup" in the first 1e9 iterations.

Thank you if you do find that time at some point. If you could also show your workings, I might be able to wrap my muddled brain around it and stand a chance of re-applying your derivation.

it probably has little bearing on this actual case where we're looking at MD5 hashes instead of random selections.

Whilst the MD5 hash is known to be imperfect, it has been well analysed and has been demonstrated to produce a close to perfect random distribution of bits from the input data by several practical measures.

Eg. If you take any single input text, and produce its MD5; and then vary a single bit in the input and produce a new MD5, then -- on average -- half of the bits in the new MD5 will have changed relative to the original.

And if you repeat that process -- varying a single bit in the input and then compare the original and new MD5s -- the average number of bits changed in the outputs will tend towards 1/2. That is about as good a measure of randomness as you can hope for from a deterministic process.

I am aware of the limitations on the distribution of the hashes when derived from a non-full spectrum of inputs; but given that 2**32 (the maximum capacity of the vectors), represent such a minuscule proportion of the 1e44 possible inputs, I'd have to be extremely unlucky in my random selection from the total inputs for the hashing bias to actually have a measurable affect upon the probabilities of false positives.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

In reply to Re^5: [OT] The statistics of hashing. by BrowserUk
in thread [OT] The statistics of hashing. by BrowserUk

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others scrutinizing the Monastery: (4)
    As of 2020-09-23 01:35 GMT
    Find Nodes?
      Voting Booth?
      If at first I donít succeed, I Ö

      Results (130 votes). Check out past polls.