ibm1620 has asked for the wisdom of the Perl Monks concerning the following question:
I have a hash that maps a set of things to their "popularity". In this example, there are ten animals, each of which has received a certain number of votes.
{ alligator => 100, bear => 90, cat => 80, ... jellyfish => 10 }
total_votes = 550
I want to be able to randomly select from this set with three different biases: favoring the popular, impartial, and favoring the unpopular.
The impartial case is simple; choose the Nth item where 0 <= N <= 9.
To favor the popular, you choose a random number N from 0 to 549, then traverse the set of animals, subtracting each one's popularity from N until N goes negative, and that's the animal you choose.
But I can't figure out how to favor those with the least popularity. (Admittedly, this is more of an algorithm question than a Perl question, and I'll gladly take it to a different forum if it's inappropriate for PM.)
Here's my demo code;
#!/usr/bin/env perl
use 5.010;
use warnings;
use strict;
# Initialize %votes: map animals to their number of votes.
my @animals
= qw/alligator bear cat dog elephant fox giraffe hippo iguana jell
+yfish/;
my $votes = 100; # start with 100 votes for first word
my $decr = int( $votes / (@animals) ); # evenly decrease # of votes
my $sum = 0; # grand total of votes for all words
my %votes = map
{my $n = $votes; $sum += $n; $votes -= $decr; ( $_, $n )}
@animals;
# Display animals and votes
print '===> ';
printf "%s:%3d ", $_, $votes{$_} for sort keys %votes;
print " Total votes=$sum";
# Get random selections
my %popular;
my %impartial;
my %unpopular;
for ( 1 .. $sum ) {
$popular{ popular() }++; # should tend toward the popular
$impartial{ impartial() }++; # should be equally distributed
$unpopular{ unpopular() }++; # should tend toward the unpopular
}
# Display distribution of votes from each method
print "\nPOP: ";
printf "%s:%3d ", $_, $popular{$_}//0
for sort keys %votes;
print "\nIMP: ";
printf "%s:%3d ", $_, $impartial{$_}//0
for sort keys %votes;
print "\nUNP: ";
printf "%s:%3d ", $_, $unpopular{$_}//0
for sort keys %votes;
say '';
# Method 1: choose randomly with preference for higher vote-getters
sub popular {
my $rand = int( rand $sum );
for my $key (
# sort { ($votes{$b} <=> $votes{$a}) } #unnecessary
keys %votes ) {
return $key if ( $rand -= $votes{$key} ) <= 0;
}
die 'popular bug';
}
# Method 2: choose randomly without regard for popularity
sub impartial {
return ( keys %votes )[ rand keys %votes ];
}
# Method 3: choose randomly with preference for lower vote-getters
sub unpopular {
return '?';
}
Re: Randomly choosing from a set of alternatives with varying popularity
by hv (Prior) on Mar 28, 2022 at 03:28 UTC
|
One approach for favouring the unpopular answers would be to weight by the reciprocal of the votes, so jellyfish would be 10 times likelier to show up than alligator. Another would be to replace $v votes by $max + $min - $v (which would have the same jellyfish to alligator ratio, but would weight the middle choices differently).
Note that the reciprocal approach can use the same algorithm as 'popular()' if you remove the unnecessary int(...) from its first line. The min/max approach would work with or without that change.
| [reply] [d/l] [select] |
|
use strict;
use warnings;
use Data::Dump qw/pp dd/;
my %vote = ( alligator => 100, bear => 90, cat => 80, jellyfish => 10
+);
my $N =0;
$N += $_ for values %vote;
# --- invert by division, proportion of non-voters
my $N2 = 0;
my %vote2 = map {
my $v2 = 1/$vote{$_};
$N2 += $v2;
$_ => $v2
} keys %vote;
# --- invert by subtraction, number of non-voters
my $N3 = 0;
my %vote3 = map {
my $v3 = $N-$vote{$_};
$N3 += $v3;
$_ => $v3
} keys %vote;
my (%dist1,%dist2,%dist3);
my $e=5;
$dist1{pick($N,\%vote)}++ for 1..10**$e;
$dist2{pick($N2,\%vote2)}++ for 1..10**$e;
$dist3{pick($N3,\%vote3)}++ for 1..10**$e;
pp \%dist1,\%dist2,\%dist3;
sub pick {
my ($N,$h_vote)=@_;
my @deb;
my $r =rand($N);
push @deb,$r;
scalar keys %$h_vote; # reset each
while ( my ($k,$v) = each %$h_vote) {
$r -= $v;
push @deb,[$r,$k,$v];
return $k if $r <=0;
}
die "ERROR", pp \@deb;
}
OUTPUT: (
{ alligator => 35891, bear => 31807, cat => 28792, jellyfish => 3510
+ },
{ alligator => 7497, bear => 8300, cat => 9481, jellyfish => 74722 }
+,
{ alligator => 21416, bear => 22542, cat => 23870, jellyfish => 3217
+2 },
)
On a tangent: had to debug why each failed me. Needed reset.
| [reply] [d/l] [select] |
|
The reciprocal approach definitely gives me what I had in mind, more so than computing the non-voters. That's what I'll use. Thanks.
(Incidentally, should your comment above read "invert by division, proportion of voters", or am I missing something?)
| [reply] |
|
|
Thank you - I had a hunch reciprocals might be the way to reverse popularity, but wasn't sure the 'popular()' algorithm would work with them. (Was never entirely comfortable with floating point arithmetic :-)
| [reply] |
|
Well yeah, me neither: you know where you are with integers. :)
If it reaches the stage that the maths becomes important, you would certainly want to take a lot more care. One way to do that would be to avoid the simplistic "walk a list of (floating-point) weights" algorithm, and instead work to make each reciprocal exact: for each of the options in turn, give it an exact 1/votes chance to get chosen, and if it fails proceed to the next. On reaching the end of the list without a choice, start again (which doesn't necessarily have to be in the same order).
Getting an exact "1/votes" chance requires understanding a specific invariant of the RNG, the number of bits that it generates - exposed by perl via use Config; print $Config{randbits}. When something has 3 votes, for example, an RNG that generates 16 bits will slightly favour the value 0 (mod 3) by the ratio 21846:21845:21845. So for each option you would calculate $valid_lim = $votes * int(2**RNGbits / $votes) == 65535 in advance, and then re-roll any random value while ($rand = int(rand(2**$RNGbits))) >= $valid_lim before doing the modulus check choose_me() if ($rand % $votes) == 0, to remove that bias.
That means doing more work (and using up more of the available entropy), so you won't want to do this unless the vote requires this sort of critical accuracy.
| [reply] [d/l] [select] |
|
Re: Randomly choosing from a set of alternatives with varying popularity
by jwkrahn (Abbot) on Mar 28, 2022 at 02:55 UTC
|
I'm sure that there is a better way to do this, but just off the top of my head:
use List::Util qw/ shuffle max /;
my %animals = ( alligator => 100, bear => 90, cat => 80, jellyfish =>
+10 );
my @popular_list = shuffle map { ( $_ ) x $animals{ $_ } } keys %anima
+ls;
my $popular_pick = $popular_list[ rand @popular_list ];
my @impartial_list = shuffle keys %animals;
my $impartial_pick = $impartial_list[ rand @impartial_list ];
my $max_val = 1 + max values %animals;
my @unpopular_list = shuffle map { ( $_ ) x ( $max_val - $animals{ $_
+} ) } keys %animals;
my $unpopular_pick = $unpopular_list[ rand @unpopular_list ];
| [reply] [d/l] |
Re: Randomly choosing from a set of alternatives with varying popularity
by bliako (Monsignor) on Mar 28, 2022 at 20:42 UTC
|
what you describe, in general, falls within the: "drawing random samples from a random distribution (RD)". Except that you additionally want to favour certain outcomes more. That can be achieved by transforming your RD. e.g. skew it, squash it, lift the tails, etc., prior to drawing.
Drawing samples from an RD requires you to build the cumulative RD and then land randomly on any 'y'. Corresponding 'x' will be the random sample you need. You have a discrete RD, a histogram. So, build a cumulative histogram, transform the histogram to give better odds to unpopular items and then draw the sample.
I would keep the 2 tasks separately.
3day edit: in the 2nd paragraph swapped 'x' and 'y'. So basically, draw a random number from 0 to 1 this will be the 'y', the spot on the y-axis. The corresponding 'x' will be a random sample wrt this RD.
bw, bliako
| [reply] |
Re: Randomly choosing from a set of alternatives with varying popularity
by salva (Canon) on Mar 29, 2022 at 07:48 UTC
|
But I can't figure out how to favor those with the least popularity
There are several ways to do that. Which one is the most appropriate is very dependent on your final problem.
Anyway, I have not seen anybody propose an exponential solution, so let me do it. The idea is that every subject start with the same score, 1.0, for instance, and every time one of them gets a vote, you reduce its score multiplying it by a factor c such that 0 < c < 1.
As in your particular case, you have already done a draw, you can calculate the current scores as $score{$k} = $c ** $votes{$k}.
The only issue remaining is how to pick c. The key here is how much you want to penalize the popular ones.
| [reply] [d/l] [select] |
|
Another interesting idea. I modified LanX's code to try this for several different values of c:
#!/usr/bin/env perl
use 5.010;
use strict;
use warnings;
use Data::Dump qw/pp dd/;
my %vote = ( alligator => 100, bear => 90, cat => 80, jellyfish => 10
+);
my $N =0;
$N += $_ for values %vote;
say "Vote";
show(\%vote, 1);
# --- invert by exponentiation
for my $C ( 0.9, 0.95, 0.96, 0.97, 0.98, 0.99) {
my $N4=0;
my %vote4 = map {
my $v4 = $C ** $vote{$_};
$N4 += $v4;
$_ => $v4;
} keys %vote;
my %dist4;
my $e=5;
$dist4{pick($N4,\%vote4)}++ for 1..10**$e;
say "C=$C";
show( \%dist4, -1);
}
sub show { # show counts and percentages
my $href = shift;
my $dir = shift; # sort direction 1=ascending, -1=descending
my $tot=0;
$tot += $_ for values %$href;
for my $k (sort {$dir * ($a cmp $b)} keys %$href) {
printf "%s => %5d (%5.2f%%) ",
$k, $href->{$k}, 100*($href->{$k}/$tot);
}
say '';
}
sub pick {
my ($N,$h_vote)=@_;
my @deb;
my $r =rand($N);
push @deb,$r;
scalar keys %$h_vote; # reset each
while ( my ($k,$v) = each %$h_vote) {
$r -= $v;
push @deb,[$r,$k,$v];
return $k if $r <=0;
}
die "ERROR", pp \@deb;
}
Output:
Vote
alligator => 100 (35.71%) bear => 90 (32.14%) cat => 80 (28.
+57%) jellyfish => 10 ( 3.57%)
C=0.9
jellyfish => 99902 (99.90%) cat => 65 ( 0.07%) bear => 24 ( 0.
+02%) alligator => 9 ( 0.01%)
C=0.95
jellyfish => 94748 (94.75%) cat => 2712 ( 2.71%) bear => 1599 ( 1.
+60%) alligator => 941 ( 0.94%)
C=0.96
jellyfish => 89102 (89.10%) cat => 5269 ( 5.27%) bear => 3440 ( 3.
+44%) alligator => 2189 ( 2.19%)
C=0.97
jellyfish => 78766 (78.77%) cat => 9190 ( 9.19%) bear => 6939 ( 6.
+94%) alligator => 5105 ( 5.10%)
C=0.98
jellyfish => 62428 (62.43%) cat => 15076 (15.08%) bear => 12415 (12.
+41%) alligator => 10081 (10.08%)
C=0.99
jellyfish => 42944 (42.94%) cat => 20843 (20.84%) bear => 18984 (18.
+98%) alligator => 17229 (17.23%)
| [reply] [d/l] [select] |
Re: Randomly choosing from a set of alternatives with varying popularity
by siberia-man (Friar) on Mar 29, 2022 at 09:13 UTC
|
This question reminded me my old note I kept for case. It doesn't cover the actual topic but can be (or cannot be) considered as some hint.
#!/usr/bin/env perl
use strict;
use warnings;
use List::Util qw(max);
# Lower weight means higher precedence
my %weight_map = (
alligator => 100,
bear => 90,
cat => 80,
jellyfish => 10,
zebra => 1,
);
my $weight_lowest = max(values %weight_map) + 1;
my @z0 = qw/ bear wolf jellyfish zebra dog fox cat /;
my @z1 = sort { $a cmp $b } @z0;
my @z2 = sort {
( $weight_map{$a} // $weight_lowest )
<=>
( $weight_map{$b} // $weight_lowest )
or
$a cmp $b;
} @z0;
print <<RESULT;
unsorted : [@z0]
sorted : [@z1]
weighted : [@z2]
RESULT
And the execution results to:
unsorted : [bear wolf jellyfish zebra dog fox cat]
sorted : [bear cat dog fox jellyfish wolf zebra]
weighted : [zebra jellyfish cat bear dog fox wolf]
| [reply] [d/l] [select] |
Re: Randomly choosing from a set of alternatives with varying popularity
by LanX (Saint) on Mar 28, 2022 at 01:23 UTC
|
> To favor the popular, you choose a random number N from 0 to 549, then traverse the set of animals, subtracting each one's popularity from N until N goes negative, and that's the animal you choose.
I don't think that's a good idea. More than 80% of your random numbers will be bigger than 100, which means none will be picked.
This also explains why you have problems inverting this "solution".
Furthermore a linear search in the hash won't be truly random, because the hash order is fixed within the duration of a run. This will result in a disproportionate representation of the early entries.
Solution is easy:
sort the N entries and chose a modified random function which favors one end of the range, like $i = int $N * rand() ** $e and $j = $N - 1 - $i for index in your sorted list. The way you chose $e > 1 will decide about the degree of bias ( $e=1 means no bias)
Just test the distribution of the indices:
DB<41> %h=();$e=3; $N=10; $h{ int $N * rand() **$e }++ for 1..10000
DB<42> x \%h
0 HASH(0x2f25fe8)
0 => 4670
1 => 1215
2 => 838
3 => 686
4 => 550
5 => 484
6 => 407
7 => 388
8 => 426
9 => 336
DB<43>
In this example the first 10% are more than 10 times more likely to be picked than the last 10%.
Of course YMMV about the distribution you prefer, but the basic idea is the same.
| [reply] [d/l] [select] |
|
I suspect you've misunderstood the 'popular' algorithm: 550 is the sum of the votes, so if (for example) the hash is traversed in order of decreasing popularity, rolling 0..99 will give alligator, 100..179 will give bear, etc., until 540..549 gives jellyfish. As far as I know this is pretty much the standard approach (up to optimizations) for picking from a set with weights.
| [reply] |
|
> rolling 0..99 will give alligator, 100..179 will give bear, etc., until 540..549 gives jellyfish.
Indeed, I missed the fact that N is an accumulated sum where all votes so far are subtracted. That wasn't obvious from the description and the code wasn't concise.
> As far as I know this is pretty much the standard approach
I'm curious to know how my other objection is handled. The order in a Perl hash is random, but fixed during a run. So this algorithm will be biased to some results.
Do you have a link to this "standard approach" discussing the distribution of picks?
My guess is the list must be sorted in ascending order. Or to be more precise that the choice over ascending or descending decides over popular vs unpopular.
> 0..99 will give alligator, 100..179 will give bear, etc., until 540..549 gives jellyfish.
FWIW, the results you give can only be reproduced with a descending sort.
> > > { alligator => 100, bear => 90, cat => 80, ... jellyfish => 10 }
(well almost s/179/189/ )
NB: the OP has uncommented the sort in his code.
> > > # sort { ($votes{$b} <=> $votes{$a}) } #unnecessary
| [reply] [d/l] [select] |
|
|
|
|
|