Thank you roboticus.
I ran your algorithm against the results of my 60-hour run, and the correlation is simply stunning!
#! perl -slw
use strict;
sub ex_10 {
my ($N, $X) = @_;
return -$N/10*exp(-10*$X/$N)
+ 10*$N/9 *exp( -9*$X/$N)
- 45*$N/8 *exp( -8*$X/$N)
+ 120*$N/7 *exp( -7*$X/$N)
- 35*$N *exp( -6*$X/$N)
+ 252*$N/5 *exp( -5*$X/$N)
- 105*$N/2 *exp( -4*$X/$N)
+ 40*$N *exp( -3*$X/$N)
- 45*$N/2 *exp( -2*$X/$N)
+ 10*$N *exp( -1*$X/$N)
+ $X;
}
my $exp10_0 = ex_10( 2**32, 0 );
open I, '<', 'L25B32V10I32S1.posns' or die $!;
while( <I> ) {
my( $inserts, $collisions ) = split;
my $exp3 = ex_10( 2**32, $inserts ) - $exp10_0;
printf "Insert %10d: Predicted %6d Actual: %6d Delta: %+.3f%%\n"
+,
$inserts, $exp3, $collisions, ( $collisions - $exp3 ) / $exp3
+* 100;
}
__END__
C:\test>rbtcs-form-verify
Insert 779967210: Predicted 1 Actual: 1 Delta: -18.051%
Insert 782382025: Predicted 1 Actual: 2 Delta: +58.813%
Insert 830840395: Predicted 2 Actual: 3 Delta: +29.292%
Insert 882115420: Predicted 4 Actual: 4 Delta: -5.961%
Insert 883031614: Predicted 4 Actual: 5 Delta: +16.325%
Insert 897571477: Predicted 5 Actual: 6 Delta: +18.390%
Insert 923155269: Predicted 6 Actual: 7 Delta: +4.086%
Insert 948108745: Predicted 8 Actual: 8 Delta: -8.996%
Insert 954455113: Predicted 9 Actual: 9 Delta: -4.244%
Insert 967783959: Predicted 10 Actual: 10 Delta: -7.404%
Insert 988381482: Predicted 13 Actual: 11 Delta: -17.487%
Insert 992691311: Predicted 13 Actual: 12 Delta: -13.814%
Insert 995935158: Predicted 14 Actual: 13 Delta: -9.624%
Insert 1011301141: Predicted 16 Actual: 14 Delta: -16.457%
Insert 1013742872: Predicted 17 Actual: 15 Delta: -12.616%
Insert 1022258193: Predicted 18 Actual: 16 Delta: -14.242%
Insert 1031874023: Predicted 20 Actual: 17 Delta: -16.989%
Insert 1034026887: Predicted 20 Actual: 18 Delta: -13.909%
Insert 1036254774: Predicted 21 Actual: 19 Delta: -11.051%
Insert 1037064360: Predicted 21 Actual: 20 Delta: -7.093%
Insert 1037193945: Predicted 21 Actual: 21 Delta: -2.569%
Insert 1037309710: Predicted 21 Actual: 22 Delta: +1.957%
...( ~ 52000 ellided }
Insert 2375752842: Predicted 52209 Actual: 52277 Delta: +0.130%
Insert 2375755671: Predicted 52209 Actual: 52278 Delta: +0.131%
Insert 2375756509: Predicted 52209 Actual: 52279 Delta: +0.133%
Insert 2375760656: Predicted 52210 Actual: 52280 Delta: +0.133%
Insert 2375763928: Predicted 52211 Actual: 52281 Delta: +0.134%
Insert 2375785238: Predicted 52215 Actual: 52282 Delta: +0.128%
Insert 2375788721: Predicted 52215 Actual: 52283 Delta: +0.128%
Insert 2375789878: Predicted 52216 Actual: 52284 Delta: +0.130%
Insert 2375790896: Predicted 52216 Actual: 52285 Delta: +0.131%
Insert 2375798283: Predicted 52217 Actual: 52286 Delta: +0.131%
And that still includes the possibility -- though I believe it to be remote -- that there are 1 or more actual dups in there.
My only regret now is that I wish I'd allowed the run to go to the 3/4 point instead of stopping it half way. The number of false positive would still have been easily manageable for the second pass:
C:\test>rbtcs-bloom-probb 32 10 30e8
N=4294967296, V=10, X=30e8 integral(30e8)=12580199467.653, integral(0)
+=12579822861.8159
Expected collisions: 376605.837186813
That is an order of magnitude less that my own crude attempt was predicting, hence why I stopped the run.
The only way I can repay you is to assure you that I will do my very best to try and understand the formula -- which I do not currently.
And of course, give you credit, when they world comes knocking at my door for my eminently patentable -- at least if you take Apple as your guide -- algorithm :)
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.