Biased random number selection

athomason has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Biased random number selection by ferrency (Deacon) on Jul 12, 2000 at 00:57 UTC
If you're choosing between n different values, it's very easy to make an order(n) routine to do this, as someone else has already shown. But, you can do better. If you pre-build an array of partial frequency sums, then you have a sorted array, and you can do a binary search on it for the correct number. This would be faster, If n is relatively large, And if the frequency table doesn't change very much. # Warning: Untested code, almost definitely has off by one errors! # Given a frequency table in @freq, build a summed frequency table. my $sum; @sf = map {$sum += $_} @freq; # Now we choose a random number in the range of total frequency sum my $target = rand($sum); # To find out what the Answer is (weighted freq from 0 to n), # we now find n such that $sf[n-1]< $target < $sf[n] # We keep track of the numbers we know $n is between my $a = 0; my $b = @sf; my $n; # repeat until we narrow down to one answer while ($a < $b) { $n = int(($a + $b) / 2); if ($sf[$n] < $target) { $a = $n; } elsif ($sf[$n-1] > $target) { $b = $n; } else { # $target < $sf[n] and $target > $sf[n-1], # so we have an answer. last; } } print "I found $n\n"; [download] Like I said, this is not going to work as-is, but hopefully you get the idea. There is a greater overhead in the loop than doing a linear search, and a greater setup cost to build the array of summed frequencies. But the loop runs in order (log n) time instead of order (n) time, so for sufficiently large n, this can be a great improvement over a straight linear search (use benchmarking to make sure it's a Win). If you have a known frequency distribution (ie the frequencies are all almost equal, or there are a lot of low frequencies and a few high ones), you may be able to get Even Faster results if you use some heuristics to choose a better method than "look halfway between my known upper and lower bounds for n." For example, if your frequencies are all pretty equal, you can get a better idea of a guess for $n by seeing how far away `$sf[$n]` is from `$target`, and moving $n an appropriate distance, instead of just going halfway. Hope this helps. Does anyone know of a good binary search module? Alan	[reply] [d/l] [select]
RE: Re: Biased random number selection by athomason (Curate) on Jul 12, 2000 at 01:08 UTC
Very cool. The summed frequency array was one of the ideas I implemented, though I shot myself in the foot after that part. I was so determined to squeeze every ounce of speed out that I constructed an array of length `$sf[n]`, with `$freq[i]` elements with value `i` for each `i`. The lookup is O(1), which beats the pants off O(logn) and O(n). Unfortunately, my `n` actually is fairly large (~10,000), and I ran out of memory pretty quickly (even with 2GB/proc). The O(n) linear search over `@sf` was going to be my next attempt, but I'll certainly compare it to the binary search.	[reply] [d/l] [select]
RE: RE: Re: Biased random number selection by ferrency (Deacon) on Jul 12, 2000 at 01:37 UTC
Yep, overall it's a typical speed/space tradeoff. Another benefit of the linear or binary traversal solutions is, your frequencies needn't be integers (well, assuming rand() works correctly with non-integers, that is).	[reply]
Re: Biased random number selection by btrott (Parson) on Jul 12, 2000 at 00:21 UTC
This is the algorithm I've seen posted elsewhere (when I did a search a long time ago in the perl newsgroups), and that I've used myself: `my($tw, $winner) = (0, undef); my @numbers = ( ... ); my @freq = ( ... ); for my $idx (0..$#numbers) { $tw += $freq[ $idx ]; $winner = $idx if rand($tw) < $freq[ $idx ]; } return $winner;` [download] There's also an example in the Perl Cookbook, apparently, if you have that.	[reply] [d/l]
Re: Biased random number selection by gnat (Beadle) on Jul 12, 2000 at 01:12 UTC
Here's the code from the Perl Cookbook: If you have a list of weights and values you want to randomly pick from, follow this two step process: first, turn the weights into a probability distribution with `weight_to_dist` below, then use the distribution to randomly pick a value with `weighted_rand`: # weight_to_dist: takes a hash mapping key to weight and returns # a hash mapping key to probability sub weight_to_dist { my %weights = @_; my %dist = (); my $total = 0; my ($key, $weight); local $_; foreach (values %weights) { $total += $_; } while ( ($key, $weight) = each %weights ) { $dist{$key} = $weight/$total; } return %dist; } # weighted_rand: takes a hash mapping key to probability, and # returns the corresponding element sub weighted_rand { my %dist = @_; my ($key, $weight); while (1) { # to avoid floating point inac +curacies my $rand = rand; while ( ($key, $weight) = each %dist ) { return $key if ($rand -= $weight) < 0; } } } [download] I hope this helps. I'm not sure how closely it approximates your problem. Let me know if I can help more. Nat	[reply] [d/l]
Re: Biased random number selection by Anonymous Monk on Jul 12, 2000 at 06:14 UTC
There's an algorithm in the cookbook that deals with this problem, and that's already quoted here. But, if you are willing to spend the memory, then you might be able to to do it faster. The following algorithm assumes that @freq contains non-negative integers (the algorithm can trivially be extended to non-negative rational numbers). It also assumes you have enough memory for an array of size $N, where `$N = do {my $s = 0; $s += $_ for @freq; $s}`, and that the frequency doesn't change. After preproccessing, the sampling can be done in constant time. my @set = map {($_) x $freq $_} 0 .. $#freq; # Draw a weighted random number. sub draw {$set rand @set} # Now, where did the brackets go??????? -- Abigail	[reply]
RE: Biased random number selection by merlyn (Sage) on Jul 12, 2000 at 13:13 UTC
I did one of those for an early WebTechniques Perl column. The algorithm is different from the one that the Cookbook uses, but achieves the same results. -- Randal L. Schwartz, Perl hacker	[reply]
Re: Biased random number selection by ferrency (Deacon) on Jul 12, 2000 at 20:03 UTC
I just thought of one more potential optimization if you're going to use a linear search through the list of frequencies. If your frequencies are very disparate (some values are much more likely to occur than others), then searching through the most likely results first would yield results faster, on average. To do this, you can sort the frequency list by decreasing frequency, and then search linearly through the list for your target value. In the frequency list you'd need to store n as well as frequency(n), because the index into the frequency list will no longer be equal to n once you sort the list. Alan	[reply]
RE: Re: Biased random number selection by merlyn (Sage) on Jul 12, 2000 at 20:44 UTC
If your frequencies are very disparate (some values are much more likely to occur than others), then searching through the most likely results first would yield results faster, on average. To do this, you can sort the frequency list by decreasing frequency, and then search linearly through the list for your target value. My gut says that the cost of sorting will far outweigh the cost of finding your desired item, unless you could cache the result of the sort so it's done once, not once per operation. My gut has been known to be wrong before though. {grin} -- Randal L. Schwartz, Perl hacker	[reply]