Just to be sure I understand you want ... You have 35 red widgets, 41 green widgets, 16 blue widgets, and you want to know the probability of getting 2 red, 1 green, 0 blue, if you were chosing 3 items randomly from all widgets? Also, the *order in which they were drawn is irrelevant* (drawing green,red,red is the same as red,green,red), and you are drawing *without replacement*.
Continuing with that example, the probability of getting green,red,red, *in that order*, is:
`(41/92)*(35/91)*(34/90)
`
All we have to do now is multiply this quantity by the number of ways to rearrange (green,red,red). In this case, there are 3 ways to rearrange them.
In general, the probability of getting x1 red, x2 green, x3 blue, etc, *in some particular order*, is the following: (I separated out the numerator and denominator to make it easier)
`(#red * (#red-1) * ... (#red-x1+1))
* (#green * (#green-1) * ... (#green-x2+1) * ...
/ ( total * (total-1) * (total-2) * ... )
`
And the number of ways of reordering a list of x1 reds, x2 greens, x3 blues, is
`(x1+x2+x3+ ...)! / (x1! * x2! * x3! * ... )
`
Those exclamation points are for factorials.
As perl code, and using the example in your original post (I changed the indexing to be 0-based):
`use List::Util 'sum';
sub fact {
my $n = shift;
my $f = 1;
$f *= $_ for 1 .. $n;
$f;
}
## $n * ($n-1) * ... * ($n-$m+1)
sub lower {
my ($n, $m) = @_;
my $result = 1;
$result *= $n-$_ for 0 .. $m-1;
$result;
}
sub prob {
my ($sizes, $sample) = @_;
my $n = @$sizes; ## how many types of items
my $S = sum @$sizes; ## total number of items in the universe
my @sampled = (0) x $n;
my $samplesize = @$sample;
$sampled[$_]++ for @$sample;
## $sampled[x] = how many of type x
## are in this sample?
my $total = fact($samplesize) / lower($S, $samplesize);
$total *= lower($sizes->[$_], $sampled[$_]) / fact($sampled[$_])
for 0 .. $n-1;
$total;
}
## how many ways to get 2 of type 8, 1 of type 3, etc.. ?
print prob( [35000,41000,16000,18000,21000,45000,27000,10000,16000],
[8,8,3,1,5,0,0] );
## output: 0.000397344651946416
`
Disclaimer: This code not tested in any great detail. Also keep in mind that all the probabilities will look small, because there are so many different events in this distribution.
**Update:** updated computation to account for drawing without replacement. Old code is still in comment tags. The probabilities hardly change at all since the domain is so large.
**Update:** fixed copy & paste error, thanks to BrowserUk++.
| [reply] [d/l] [select] |

` $total *= lower($sizes->[$_], $sampled[$_]) / fact($sampled[$_])
`
and by implication of your use of `$_`, also a for modifier?
Should that be?
` $total *= lower($sizes->[$_], $sampled[$_]) / fact($sampled[$_]) for
+ 0..$n-1;
`
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |

| [reply] |

First, your code would probably be cleaner like this (avoiding the use of variables like $entity1 .. $entity9; that's what arrays are for):
`use strict;
use warnings;
my @entity = (
35000,
41000,
16000,
18000,
21000,
45000,
27000,
10000,
16000,
);
my( @sample ) = @entity{ 8, 8, 3, 1, 5, 0, 0 };
my $prob = probability( \@entity, \@sample );
`
For your probability data, consider each "draw" from @entity to be similar to a roll of the dice. Each roll has equal odds of hitting a given number, and a number can be hit more than once. That means the probability of getting a particular sequence should be equally improbable no matter what the sequence happens to be. It is just as improbable of rolling from slot 1, 2, 3, 4, 5 as it is to roll from slot 1, 1, 1, 1, 1. The probability, therefore, is based on the number of rolls and the number of "sides" (number of entities), not the sequence itself.
On the other hand, if you're gathering a total based on the draws, the probability of totals will fall along a bell curve. This is something that D&D players know well: If you roll three six-sided dice, your odds of getting a 3 or an 18 are less than your odds of getting an 11. That's because there are more combinations that could net you the sum of 11 , but only one combination (all sixes) will get you a sum of 18. There's an article about this at wikipedia here: Dice.
One thing to remember is that your containers are "ever-full". When you draw (or roll), you're not removing, you're just looking and putting it back. At least that's the way your sample code seems to be handling things.
Your question I believe is about the probability of hitting a particular sequence. If you have nine entities, each time you draw you have a 1/9th chance of getting a particular one. Your chances of getting a particular sequence of seven draws from a nine-slot bucket is (1/9)^^7.
That means that sub probability() can be as simple as this:
`sub probability {
my $entity_count = @{$_[0]};
my $sample_count = @{$_[1]};
return( ( 1 / $entity_count ) ^^ $sample_count );
}
`
| [reply] [d/l] [select] |

Comment onSample ProbabilitiesDownloadCode