Perl Monk, Perl Meditation  
PerlMonks 
comment on 
( #3333=superdoc: print w/replies, xml )  Need Help?? 
Let's look at the probability of getting "at least one dup" (instead of "exactly one dup").
Let's also initially deal with the case where we're selecting (at random) only one number (instead of 4 or 10) each time. Let P(0) be the probability that the very first selection did not produce a duplicate: P(0) = (4294967295/4294967296)**0 # == 1, obviously Let P(1) be the probability that the second selection did not produce a duplicate: P(1) = (4294967295/4294967296)**1 Let P(2) be the probability that the third selection did not produce a duplicate: P(2) = (4294967295/4294967296)**2 and so on: Let P(1e9 + 1) be the probability that the 1000000001st selection did not produce a duplicate: P(1e9) = (4294967295/4294967296)**1e9 (In general terms, P(x1) is simply the probability that none of the x1 selections already made match the xth selection.) Then the probability that we can make 1000000001 random selections in the range (1 .. 4294967296) and get zero duplicates is P(0)*P(1)*P(2)*P(3)*...*P(1e9). That equates to (4294967295/4294967296)**Z, where Z = 0+1+2+3+...+1e9. So, the probablility D that we can make 1000000001 selections and have at least 1 duplicate is D = 1  ((4294967295/4294967296)**Z) If we're doing that 4atatime, then we need to calculate D**4; doing it 10atatime we calculate D**10. Is that sane ? Does it produce sane results ? (I think it should, but I don't have time to check.) 10MINUTES LATER AFTERTHOUGHT: I don't think the "D**4" and "D**10" calculations actually tell us what we want ... gotta think about it a bit more ... Cheers, Rob In reply to Re^6: [OT] The statistics of hashing.
by syphilis

