Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Sample Probabilities

by blokhead (Monsignor)
on Dec 07, 2006 at 18:15 UTC ( [id://588420]=note: print w/replies, xml ) Need Help??


in reply to Sample Probabilities

Just to be sure I understand you want ... You have 35 red widgets, 41 green widgets, 16 blue widgets, and you want to know the probability of getting 2 red, 1 green, 0 blue, if you were chosing 3 items randomly from all widgets? Also, the order in which they were drawn is irrelevant (drawing green,red,red is the same as red,green,red), and you are drawing without replacement.

Continuing with that example, the probability of getting green,red,red, in that order, is:

(41/92)*(35/91)*(34/90)
All we have to do now is multiply this quantity by the number of ways to rearrange (green,red,red). In this case, there are 3 ways to rearrange them.

In general, the probability of getting x1 red, x2 green, x3 blue, etc, in some particular order, is the following: (I separated out the numerator and denominator to make it easier)

(#red * (#red-1) * ... (#red-x1+1)) * (#green * (#green-1) * ... (#green-x2+1) * ... / ( total * (total-1) * (total-2) * ... )
And the number of ways of reordering a list of x1 reds, x2 greens, x3 blues, is
(x1+x2+x3+ ...)! / (x1! * x2! * x3! * ... )
Those exclamation points are for factorials.

As perl code, and using the example in your original post (I changed the indexing to be 0-based):

use List::Util 'sum'; sub fact { my $n = shift; my $f = 1; $f *= $_ for 1 .. $n; $f; } ## $n * ($n-1) * ... * ($n-$m+1) sub lower { my ($n, $m) = @_; my $result = 1; $result *= $n-$_ for 0 .. $m-1; $result; } sub prob { my ($sizes, $sample) = @_; my $n = @$sizes; ## how many types of items my $S = sum @$sizes; ## total number of items in the universe my @sampled = (0) x $n; my $samplesize = @$sample; $sampled[$_]++ for @$sample; ## $sampled[x] = how many of type x ## are in this sample? my $total = fact($samplesize) / lower($S, $samplesize); $total *= lower($sizes->[$_], $sampled[$_]) / fact($sampled[$_]) for 0 .. $n-1; $total; } ## how many ways to get 2 of type 8, 1 of type 3, etc.. ? print prob( [35000,41000,16000,18000,21000,45000,27000,10000,16000], [8,8,3,1,5,0,0] ); ## output: 0.000397344651946416
Disclaimer: This code not tested in any great detail. Also keep in mind that all the probabilities will look small, because there are so many different events in this distribution.

Update: updated computation to account for drawing without replacement. Old code is still in comment tags. The probabilities hardly change at all since the domain is so large.

Update: fixed copy & paste error, thanks to BrowserUk++.

blokhead

Replies are listed 'Best First'.
Re^2: Sample Probabilities
by BrowserUk (Patriarch) on Dec 08, 2006 at 07:20 UTC

    There appears to have been a C&P error in your code. This line is missing a ;

    $total *= lower($sizes->[$_], $sampled[$_]) / fact($sampled[$_])

    and by implication of your use of $_, also a for modifier?

    Should that be?

    $total *= lower($sizes->[$_], $sampled[$_]) / fact($sampled[$_]) for + 0..$n-1;

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Sample Probabilities
by willyyam (Priest) on Dec 07, 2006 at 19:11 UTC

    Outstanding! Thank you. It's been a long time since Finite Math in High School, and I just couldn't remember how to approach this. The problem with Google and most probability tutorials is that you need to know what things are called. Thanks again.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://588420]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (9)
As of 2024-04-23 08:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found