in reply to [OT] The statistics of hashing.
How to calculate those odds for each new hash, as the number of hashes already set into the vectors increases?
Update: Following figure is not right. (Corrected figure provided in following post.)
IIUC it's just (N/4294967296)**4, where N is the number of MD5 hashes that have already been entered into the bit vector.
But that's assuming that MD5 hashes distribute evenly, and I don't know if that has been established (or disproved). If they don't distribute evenly, then the odds of hitting a false positive will increase.
Not sure what affect your "Bloom Filters" variation would have.
Cheers,
Rob
Update: Following figure is not right. (Corrected figure provided in following post.)
IIUC it's just (N/4294967296)**4, where N is the number of MD5 hashes that have already been entered into the bit vector.
But that's assuming that MD5 hashes distribute evenly, and I don't know if that has been established (or disproved). If they don't distribute evenly, then the odds of hitting a false positive will increase.
Not sure what affect your "Bloom Filters" variation would have.
Cheers,
Rob
In Section
Seekers of Perl Wisdom