athomason has asked for the wisdom of the Perl Monks concerning the following question:
Fellow Monks- I need to select a random number from 0 to n, with the caveat that each number should not be selected with equally probability. I have an array (say, @freq, defined from 0..n) containing non-normalized relative frequencies of each index. In other words, the probability of selecting i should come out to be $freq[i]/$sum_over_freq. The routine is in the middle of a very time-critical loop (I'll eventually need to port this to Fortran on a 128-proc RS/6000, in fact), so speed is the primary concern. I've coded up a few ugly ways to do this, but what I have is all too slow. @freq is approximately static throughout the critical loop, so it might be favorable to generate a lookup within the loop that encases it. Any ideas on how best to implement this?
Re: Biased random number selection
by ferrency (Deacon) on Jul 12, 2000 at 00:57 UTC
|
If you're choosing between n different values, it's very
easy to make an order(n) routine to do this, as someone
else has already shown.
But, you can do better. If you pre-build an array of partial
frequency sums, then you have a sorted array, and you can
do a binary search on it for the correct number. This
would be faster, If n is relatively large, And if the frequency
table doesn't change very much.
# Warning: Untested code, almost definitely has off by one errors!
# Given a frequency table in @freq, build a summed frequency table.
my $sum;
@sf = map {$sum += $_} @freq;
# Now we choose a random number in the range of total frequency sum
my $target = rand($sum);
# To find out what the Answer is (weighted freq from 0 to n),
# we now find n such that $sf[n-1]< $target < $sf[n]
# We keep track of the numbers we know $n is between
my $a = 0; my $b = @sf; my $n;
# repeat until we narrow down to one answer
while ($a < $b) {
$n = int(($a + $b) / 2);
if ($sf[$n] < $target) {
$a = $n;
} elsif ($sf[$n-1] > $target) {
$b = $n;
} else {
# $target < $sf[n] and $target > $sf[n-1],
# so we have an answer.
last;
}
}
print "I found $n\n";
Like I said, this is not going to work as-is,
but hopefully you get the idea. There is a greater
overhead in the loop than doing a linear search, and a
greater setup cost to build the array of summed frequencies.
But the loop runs in order (log n) time
instead of order (n) time, so for sufficiently large n,
this can be a great improvement over a straight linear
search (use benchmarking to make sure it's a Win).
If you have a known frequency distribution (ie the
frequencies are all almost equal, or there are a lot of
low frequencies and a few high ones), you may be able to
get Even Faster results if you use some heuristics to choose a better
method than "look halfway between my known upper and lower bounds for n."
For example, if your frequencies are all pretty equal, you can get a
better idea of a guess for $n by seeing how far away
$sf[$n] is from $target, and
moving $n an appropriate distance, instead of just going
halfway.
Hope this helps. Does anyone know of a good binary search module?
Alan
| [reply] [d/l] [select] |
|
Very cool. The summed frequency array was one of the ideas I implemented, though I shot myself in the foot after that part. I was so determined to squeeze every ounce of speed out that I constructed an array of length $sf[n], with $freq[i] elements with value i for each i. The lookup is O(1), which beats the pants off O(logn) and O(n). Unfortunately, my n actually is fairly large (~10,000), and I ran out of memory pretty quickly (even with 2GB/proc). The O(n) linear search over @sf was going to be my next attempt, but I'll certainly compare it to the binary search.
| [reply] [d/l] [select] |
|
Yep, overall it's a typical speed/space tradeoff. Another benefit of the linear or binary traversal solutions is, your frequencies needn't be integers (well, assuming rand() works correctly with non-integers, that is).
| [reply] |
Re: Biased random number selection
by btrott (Parson) on Jul 12, 2000 at 00:21 UTC
|
This is the algorithm I've seen posted elsewhere (when
I did a search a long time ago in the perl newsgroups), and that
I've used myself:
my($tw, $winner) = (0, undef);
my @numbers = ( ... );
my @freq = ( ... );
for my $idx (0..$#numbers) {
$tw += $freq[ $idx ];
$winner = $idx if rand($tw) < $freq[ $idx ];
}
return $winner;
There's also an example in the Perl Cookbook, apparently,
if you have that. | [reply] [d/l] |
Re: Biased random number selection
by gnat (Beadle) on Jul 12, 2000 at 01:12 UTC
|
Here's the code from the Perl Cookbook:
If you have a list of weights and values you want to randomly pick
from, follow this two step process: first, turn the weights into
a probability distribution with weight_to_dist below, then
use the distribution to randomly pick a value with weighted_rand:
# weight_to_dist: takes a hash mapping key to weight and returns
# a hash mapping key to probability
sub weight_to_dist {
my %weights = @_;
my %dist = ();
my $total = 0;
my ($key, $weight);
local $_;
foreach (values %weights) {
$total += $_;
}
while ( ($key, $weight) = each %weights ) {
$dist{$key} = $weight/$total;
}
return %dist;
}
# weighted_rand: takes a hash mapping key to probability, and
# returns the corresponding element
sub weighted_rand {
my %dist = @_;
my ($key, $weight);
while (1) { # to avoid floating point inac
+curacies
my $rand = rand;
while ( ($key, $weight) = each %dist ) {
return $key if ($rand -= $weight) < 0;
}
}
}
I hope this helps. I'm not sure how closely it approximates
your problem. Let me know if I can help more.
Nat | [reply] [d/l] |
Re: Biased random number selection
by Anonymous Monk on Jul 12, 2000 at 06:14 UTC
|
There's an algorithm in the cookbook that deals with this problem, and
that's already quoted here.
But, if you are willing to spend the memory, then you might be able to
to do it faster. The following algorithm assumes that @freq contains
non-negative integers (the algorithm can trivially be extended to
non-negative rational numbers). It also assumes you have enough memory
for an array of size $N, where
$N = do {my $s = 0; $s += $_ for @freq; $s},
and that the frequency doesn't change.
After preproccessing, the sampling can be done in constant time.
my @set = map {($_) x $freq $_} 0 .. $#freq;
# Draw a weighted random number.
sub draw {$set rand @set} # Now, where did the brackets go???????
-- Abigail
| [reply] |
RE: Biased random number selection
by merlyn (Sage) on Jul 12, 2000 at 13:13 UTC
|
| [reply] |
Re: Biased random number selection
by ferrency (Deacon) on Jul 12, 2000 at 20:03 UTC
|
I just thought of one more potential optimization if you're going to use a linear search through the list of frequencies.
If your frequencies are very disparate (some values are much more likely to occur than others), then searching through the most likely results first would yield results faster, on average.
To do this, you can sort the frequency list by decreasing frequency, and then search linearly through the list for your target value.
In the frequency list you'd need to store n as well as frequency(n),
because the index into the frequency list will no longer be equal to n once you sort the list.
Alan | [reply] |
|
| [reply] |
|
|