in reply to [OT] The statistics of hashing.
How to calculate those odds for each new hash, as the number of hashes already set into the vectors increases?
Update: Following figure is not right. (Corrected figure provided in following post.)
IIUC it's just (N/4294967296)**4, where N is the number of MD5 hashes that have already been entered into the bit vector.
But that's assuming that MD5 hashes distribute evenly, and I don't know if that has been established (or disproved). If they don't distribute evenly, then the odds of hitting a false positive will increase.
Not sure what affect your "Bloom Filters" variation would have.
Cheers,
Rob
Update: Following figure is not right. (Corrected figure provided in following post.)
IIUC it's just (N/4294967296)**4, where N is the number of MD5 hashes that have already been entered into the bit vector.
But that's assuming that MD5 hashes distribute evenly, and I don't know if that has been established (or disproved). If they don't distribute evenly, then the odds of hitting a false positive will increase.
Not sure what affect your "Bloom Filters" variation would have.
Cheers,
Rob


Replies are listed 'Best First'.  

Re^2: [OT] The statistics of hashing.
by syphilis (Archbishop) on Mar 31, 2012 at 23:55 UTC  
by BrowserUk (Patriarch) on Apr 01, 2012 at 09:04 UTC  
by syphilis (Archbishop) on Apr 01, 2012 at 19:33 UTC  
by BrowserUk (Patriarch) on Apr 01, 2012 at 20:06 UTC  
by syphilis (Archbishop) on Apr 02, 2012 at 04:02 UTC 
In Section
Seekers of Perl Wisdom