Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^6: [OT] The statistics of hashing.

by syphilis (Archbishop)
on Apr 02, 2012 at 04:02 UTC ( [id://962962]=note: print w/replies, xml ) Need Help??


in reply to Re^5: [OT] The statistics of hashing.
in thread [OT] The statistics of hashing.

Let's look at the probability of getting "at least one dup" (instead of "exactly one dup").
Let's also initially deal with the case where we're selecting (at random) only one number (instead of 4 or 10) each time.

Let P(0) be the probability that the very first selection did not produce a duplicate:
P(0) = (4294967295/4294967296)**0 # == 1, obviously

Let P(1) be the probability that the second selection did not produce a duplicate:
P(1) = (4294967295/4294967296)**1

Let P(2) be the probability that the third selection did not produce a duplicate:
P(2) = (4294967295/4294967296)**2

and so on:
Let P(1e9 + 1) be the probability that the 1000000001st selection did not produce a duplicate:
P(1e9) = (4294967295/4294967296)**1e9

(In general terms, P(x-1) is simply the probability that none of the x-1 selections already made match the xth selection.)

Then the probability that we can make 1000000001 random selections in the range (1 .. 4294967296) and get zero duplicates is
P(0)*P(1)*P(2)*P(3)*...*P(1e9).
That equates to (4294967295/4294967296)**Z, where
Z = 0+1+2+3+...+1e9.

So, the probablility D that we can make 1000000001 selections and have at least 1 duplicate is
D = 1 - ((4294967295/4294967296)**Z)

If we're doing that 4-at-a-time, then we need to calculate D**4; doing it 10-at-a-time we calculate D**10.

Is that sane ? Does it produce sane results ? (I think it should, but I don't have time to check.)

10-MINUTES LATER AFTERTHOUGHT: I don't think the "D**4" and "D**10" calculations actually tell us what we want ... gotta think about it a bit more ...

Cheers,
Rob

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://962962]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-04-25 22:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found