Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?

by tall_man (Parson)
on Jun 14, 2005 at 15:19 UTC ( [id://466599]=note: print w/replies, xml ) Need Help??


in reply to Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?

You should compute log gamma. There are formulas that compute it very quickly and accurately, for example: Malloc with Inline::C. By the way gamma(n+1) = n! for positive integer n. (Some responders said gamma(n) = n!, which is wrong).

Update: Here is some code (perl version of gammln is from Re^4: Challenge: Chasing Knuth's Conjecture):

#!/usr/bin/perl -w use strict; sub logfact { return gammln(shift(@_) + 1.0); } sub hypergeom { # There are m "bad" and n "good" balls in an urn. # Pick N of them. The probability of i or more successful selection +s: # (m!n!N!(m+n-N)!)/(i!(n-i)!(m+i-N)!(N-i)!(m+n)!) my ($n, $m, $N, $i) = @_; my $loghyp1 = logfact($m)+logfact($n)+logfact($N)+logfact($m+$n-$N) +; my $loghyp2 = logfact($i)+logfact($n-$i)+logfact($m+$i-$N)+logfact( +$N-$i)+logfact($m+$n); return exp($loghyp1 - $loghyp2); } sub gammln { my $xx = shift; my @cof = (76.18009172947146, -86.50532032941677, 24.01409824083091, -1.231739572450155, 0.12086509738661e-2, -0.5395239384953e-5); my $y = my $x = $xx; my $tmp = $x + 5.5; $tmp -= ($x + .5) * log($tmp); my $ser = 1.000000000190015; for my $j (0..5) { $ser += $cof[$j]/++$y; } -$tmp + log(2.5066282746310005*$ser/$x); } print hypergeom(300,700,100,40),"\n";
  • Comment on Re: Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?
  • Download Code

Replies are listed 'Best First'.
Re^2: Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?
by jmuhlich (Acolyte) on Jan 26, 2006 at 01:04 UTC
    I only just ran across this node, thanks everyone!

    For pure speed then you can inline the gammln code, unroll the loop into one big equation, and move the constants into the equation instead of referencing them indirectly in the array. I also got rid of $y for a very small speedup. This code runs about 135% faster than logfact above (i.e. over twice as fast). I renamed the function factln to be consistent with gammaln.

    BTW this code appears to originate in Numerical Recipes but no credit was given in Re^4: Challenge: Chasing Knuth's Conjecture, referenced in the parent. All of the function names in that book are 6 characters long, because there's a Fortran (F77) version of the book too. Thus "gammln" instead of "gammaln".

    sub factln { my $x = (shift) + 1; my $tmp = $x + 5.5; $tmp -= ($x + .5) * log($tmp); my $ser = 1.000000000190015 + 76.18009172947146 / ++$x - 86.50532032941677 / ++$x + 24.01409824083091 / ++$x - 1.231739572450155 / ++$x + 0.12086509738661e-2 / ++$x - 0.5395239384953e-5 / ++$x; return log(2.5066282746310005*$ser/($x-6)) - $tmp; }
Re: Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?
by Commander Salamander (Acolyte) on Jun 14, 2005 at 15:59 UTC
    Wow, thanks to all of you for the incredible amount of advice. I'll work my way through your suggestions today.

    Thanks again!
Re^2: Fastest way to calculate hypergeometric distribution probabilities (i.e. BIG factorials)?
by Anonymous Monk on Apr 05, 2010 at 21:15 UTC
    I could be mistaken but I think this calculates the probability of i successful selections, not i or more successful selections as claimed above. For the cdf, prob of i or more successes, you need to do the following:
    my $hypercdf = 0; for (my $iref=$i; $iref < min($N,$n); $iref++) { $hypercdf += hypergeom($n,$m,$N,$iref); } print $hypercdf;
      You may need a less than or equal to in the condition of the for loop. This probably won't make a difference in most cases, as the final probability is usually very small.
      my $hypercdf = 0; for (my $iref=$i; $iref <= min($N,$n); $iref++) { $hypercdf += hypergeom($n,$m,$N,$iref); } print $hypercdf;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://466599]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-20 03:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found