http://qs321.pair.com?node_id=242835


in reply to Weighted random numbers generator

After typing this up, I see Zaxo already posted the same idea. Ah well, here it is anyway.
#!/usr/bin/perl use strict; # Assign a weight to each item. In this example # pigs get twice the weight of dogs or cows my %weight = ( "dog" => 1, "cow" => 1, "pig" => 2 ); # Create an array with a number of elements # equal to the weights of the items my @bucket; foreach my $animal (keys %weight) { push @bucket, ($animal) x $weight{$animal}; } # @bucket now looks like this: # # $bucket[0] = "dog" # $bucket[1] = "cow" # $bucket[2] = "pig" # $bucket[3] = "pig" # # "pig" has twice as many slots as "dog" or "cow" # The choose_weighted() subroutine now just has to # pick a random element from the array. sub choose_weighted { my $bucket = shift; return $bucket->[rand(@$bucket)]; } # Test code to demonstrate it works my %count; for (1..1000) { my $animal = &choose_weighted(\@bucket); $count{$animal}++; } foreach my $animal (keys %count) { print "$animal: $count{$animal}\n"; }

-Matt

Replies are listed 'Best First'.
Re: Re: Weighted random numbers generator
by antirice (Priest) on Mar 13, 2003 at 22:53 UTC
    I think that this solution is pretty clever in getting the selection down to O(1), however, how practical is it? Suppose I do something like:

    my %weight = ( dog=>100000000, cat=>120000000, pig=>40000000000 );
    Who would do that besides evil people (such as myself >:-))? Fair enough, but what about the example group? The sub is passed a reference to an array containing: (1, 1.25, 3.6, 2). The 1.25 would get only 1 index and 3.6 would only get 3. Of course, you could figure out the least common multiple quite easily, but what about those cases where you have very long decimals? Suppose the weight is determined by a calculation that may involve pi. You need a somewhat precise value for pi so you use 3.14159265358979. If you figure out the lcm, have fun generating an array that large :). If you can round the decimals, this certainly becomes less problematic. But what if there are 1000 indices that are weighted in this fashion? You would still have an array so large that perl would run out of memory. Also, when you need a higher precision, you need a higher precision and your solution becomes impractical.

    I'm not saying that your solution is wrong. I'm merely pointing out that it could become very expensive in terms of memory and that I would probably opt for the binary search solutions others are proposing.

    Updated: Fixed a typo. I need to learn how to spell :-/

    antirice    
    The first rule of Perl club is - use Perl
    The
    ith rule of Perl club is - follow rule i - 1 for i > 1