Randomly choosing from a set of alternatives with varying popularity

ibm1620 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Randomly choosing from a set of alternatives with varying popularity by hv (Prior) on Mar 28, 2022 at 03:28 UTC
One approach for favouring the unpopular answers would be to weight by the reciprocal of the votes, so jellyfish would be 10 times likelier to show up than alligator. Another would be to replace `$v` votes by `$max + $min - $v` (which would have the same jellyfish to alligator ratio, but would weight the middle choices differently). Note that the reciprocal approach can use the same algorithm as 'popular()' if you remove the unnecessary `int(...)` from its first line. The min/max approach would work with or without that change.	[reply] [d/l] [select]
Re^2: Randomly choosing from a set of alternatives with varying popularity by LanX (Saint) on Mar 28, 2022 at 11:32 UTC
> weight by the reciprocal of the votes FWIW, here my attempts with different reciprocal functions and evaluation of distribution. See also Re^3: Randomly choosing from a set of alternatives with varying popularity for interpretation. use strict; use warnings; use Data::Dump qw/pp dd/; my %vote = ( alligator => 100, bear => 90, cat => 80, jellyfish => 10 +); my $N =0; $N += $_ for values %vote; # --- invert by division, proportion of non-voters my $N2 = 0; my %vote2 = map { my $v2 = 1/$vote{$_}; $N2 += $v2; $_ => $v2 } keys %vote; # --- invert by subtraction, number of non-voters my $N3 = 0; my %vote3 = map { my $v3 = $N-$vote{$_}; $N3 += $v3; $_ => $v3 } keys %vote; my (%dist1,%dist2,%dist3); my $e=5; $dist1{pick($N,\%vote)}++ for 1..10$e; $dist2{pick($N2,\%vote2)}++ for 1..10$e; $dist3{pick($N3,\%vote3)}++ for 1..10**$e; pp \%dist1,\%dist2,\%dist3; sub pick { my ($N,$h_vote)=@_; my @deb; my $r =rand($N); push @deb,$r; scalar keys %$h_vote; # reset each while ( my ($k,$v) = each %$h_vote) { $r -= $v; push @deb,[$r,$k,$v]; return $k if $r <=0; } die "ERROR", pp \@deb; } [download] OUTPUT: `( { alligator => 35891, bear => 31807, cat => 28792, jellyfish => 3510 + }, { alligator => 7497, bear => 8300, cat => 9481, jellyfish => 74722 } +, { alligator => 21416, bear => 22542, cat => 23870, jellyfish => 3217 +2 }, )` [download] On a tangent: had to debug why each failed me. Needed reset. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^3: Randomly choosing from a set of alternatives with varying popularity by ibm1620 (Hermit) on Mar 29, 2022 at 00:39 UTC
The reciprocal approach definitely gives me what I had in mind, more so than computing the non-voters. That's what I'll use. Thanks. (Incidentally, should your comment above read "invert by division, proportion of voters", or am I missing something?)	[reply]
Re^4: Randomly choosing from a set of alternatives with varying popularity by LanX (Saint) on Mar 29, 2022 at 09:42 UTC
Re^2: Randomly choosing from a set of alternatives with varying popularity by ibm1620 (Hermit) on Mar 29, 2022 at 22:43 UTC
Thank you - I had a hunch reciprocals might be the way to reverse popularity, but wasn't sure the 'popular()' algorithm would work with them. (Was never entirely comfortable with floating point arithmetic :-)	[reply]
Re^3: Randomly choosing from a set of alternatives with varying popularity by hv (Prior) on Mar 30, 2022 at 01:57 UTC
Well yeah, me neither: you know where you are with integers. :) If it reaches the stage that the maths becomes important, you would certainly want to take a lot more care. One way to do that would be to avoid the simplistic "walk a list of (floating-point) weights" algorithm, and instead work to make each reciprocal exact: for each of the options in turn, give it an exact 1/votes chance to get chosen, and if it fails proceed to the next. On reaching the end of the list without a choice, start again (which doesn't necessarily have to be in the same order). Getting an exact "1/votes" chance requires understanding a specific invariant of the RNG, the number of bits that it generates - exposed by perl via `use Config; print $Config{randbits}`. When something has 3 votes, for example, an RNG that generates 16 bits will slightly favour the value 0 (mod 3) by the ratio 21846:21845:21845. So for each option you would calculate `$valid_lim = $votes * int(2RNGbits / $votes) == 65535` in advance, and then re-roll any random value while `($rand = int(rand(2$RNGbits))) >= $valid_lim` before doing the modulus check `choose_me() if ($rand % $votes) == 0`, to remove that bias. That means doing more work (and using up more of the available entropy), so you won't want to do this unless the vote requires this sort of critical accuracy.	[reply] [d/l] [select]
Re^4: Randomly choosing from a set of alternatives with varying popularity by ibm1620 (Hermit) on Mar 30, 2022 at 22:06 UTC
Re: Randomly choosing from a set of alternatives with varying popularity by jwkrahn (Abbot) on Mar 28, 2022 at 02:55 UTC
I'm sure that there is a better way to do this, but just off the top of my head: use List::Util qw/ shuffle max /; my %animals = ( alligator => 100, bear => 90, cat => 80, jellyfish => +10 ); my @popular_list = shuffle map { ( $_ ) x $animals{ $_ } } keys %anima +ls; my $popular_pick = $popular_list[ rand @popular_list ]; my @impartial_list = shuffle keys %animals; my $impartial_pick = $impartial_list[ rand @impartial_list ]; my $max_val = 1 + max values %animals; my @unpopular_list = shuffle map { ( $_ ) x ( $max_val - $animals{ $_ +} ) } keys %animals; my $unpopular_pick = $unpopular_list[ rand @unpopular_list ]; [download]	[reply] [d/l]
Re: Randomly choosing from a set of alternatives with varying popularity by bliako (Monsignor) on Mar 28, 2022 at 20:42 UTC
what you describe, in general, falls within the: "drawing random samples from a random distribution (RD)". Except that you additionally want to favour certain outcomes more. That can be achieved by transforming your RD. e.g. skew it, squash it, lift the tails, etc., prior to drawing. Drawing samples from an RD requires you to build the cumulative RD and then land randomly on any 'y'. Corresponding 'x' will be the random sample you need. You have a discrete RD, a histogram. So, build a cumulative histogram, transform the histogram to give better odds to unpopular items and then draw the sample. I would keep the 2 tasks separately. 3day edit: in the 2nd paragraph swapped 'x' and 'y'. So basically, draw a random number from 0 to 1 this will be the 'y', the spot on the y-axis. The corresponding 'x' will be a random sample wrt this RD. bw, bliako	[reply]
Re: Randomly choosing from a set of alternatives with varying popularity by salva (Canon) on Mar 29, 2022 at 07:48 UTC
But I can't figure out how to favor those with the least popularity There are several ways to do that. Which one is the most appropriate is very dependent on your final problem. Anyway, I have not seen anybody propose an exponential solution, so let me do it. The idea is that every subject start with the same score, 1.0, for instance, and every time one of them gets a vote, you reduce its score multiplying it by a factor `c` such that `0 < c < 1`. As in your particular case, you have already done a draw, you can calculate the current scores as `$score{$k} = $c ** $votes{$k}`. The only issue remaining is how to pick `c`. The key here is how much you want to penalize the popular ones.	[reply] [d/l] [select]
Re^2: Randomly choosing from a set of alternatives with varying popularity by ibm1620 (Hermit) on Mar 29, 2022 at 23:59 UTC
Another interesting idea. I modified LanX's code to try this for several different values of c: #!/usr/bin/env perl use 5.010; use strict; use warnings; use Data::Dump qw/pp dd/; my %vote = ( alligator => 100, bear => 90, cat => 80, jellyfish => 10 +); my $N =0; $N += $_ for values %vote; say "Vote"; show(\%vote, 1); # --- invert by exponentiation for my $C ( 0.9, 0.95, 0.96, 0.97, 0.98, 0.99) { my $N4=0; my %vote4 = map { my $v4 = $C $vote{$_}; $N4 += $v4; $_ => $v4; } keys %vote; my %dist4; my $e=5; $dist4{pick($N4,\%vote4)}++ for 1..10$e; say "C=$C"; show( \%dist4, -1); } sub show { # show counts and percentages my $href = shift; my $dir = shift; # sort direction 1=ascending, -1=descending my $tot=0; $tot += $_ for values %$href; for my $k (sort {$dir * ($a cmp $b)} keys %$href) { printf "%s => %5d (%5.2f%%) ", $k, $href->{$k}, 100*($href->{$k}/$tot); } say ''; } sub pick { my ($N,$h_vote)=@_; my @deb; my $r =rand($N); push @deb,$r; scalar keys %$h_vote; # reset each while ( my ($k,$v) = each %$h_vote) { $r -= $v; push @deb,[$r,$k,$v]; return $k if $r <=0; } die "ERROR", pp \@deb; } [download] Output: Vote alligator => 100 (35.71%) bear => 90 (32.14%) cat => 80 (28. +57%) jellyfish => 10 ( 3.57%) C=0.9 jellyfish => 99902 (99.90%) cat => 65 ( 0.07%) bear => 24 ( 0. +02%) alligator => 9 ( 0.01%) C=0.95 jellyfish => 94748 (94.75%) cat => 2712 ( 2.71%) bear => 1599 ( 1. +60%) alligator => 941 ( 0.94%) C=0.96 jellyfish => 89102 (89.10%) cat => 5269 ( 5.27%) bear => 3440 ( 3. +44%) alligator => 2189 ( 2.19%) C=0.97 jellyfish => 78766 (78.77%) cat => 9190 ( 9.19%) bear => 6939 ( 6. +94%) alligator => 5105 ( 5.10%) C=0.98 jellyfish => 62428 (62.43%) cat => 15076 (15.08%) bear => 12415 (12. +41%) alligator => 10081 (10.08%) C=0.99 jellyfish => 42944 (42.94%) cat => 20843 (20.84%) bear => 18984 (18. +98%) alligator => 17229 (17.23%) [download]	[reply] [d/l] [select]
Re: Randomly choosing from a set of alternatives with varying popularity by siberia-man (Friar) on Mar 29, 2022 at 09:13 UTC
This question reminded me my old note I kept for case. It doesn't cover the actual topic but can be (or cannot be) considered as some hint. #!/usr/bin/env perl use strict; use warnings; use List::Util qw(max); # Lower weight means higher precedence my %weight_map = ( alligator => 100, bear => 90, cat => 80, jellyfish => 10, zebra => 1, ); my $weight_lowest = max(values %weight_map) + 1; my @z0 = qw/ bear wolf jellyfish zebra dog fox cat /; my @z1 = sort { $a cmp $b } @z0; my @z2 = sort { ( $weight_map{$a} // $weight_lowest ) <=> ( $weight_map{$b} // $weight_lowest ) or $a cmp $b; } @z0; print <<RESULT; unsorted : [@z0] sorted : [@z1] weighted : [@z2] RESULT [download] And the execution results to: `unsorted : [bear wolf jellyfish zebra dog fox cat] sorted : [bear cat dog fox jellyfish wolf zebra] weighted : [zebra jellyfish cat bear dog fox wolf]` [download]	[reply] [d/l] [select]
Re: Randomly choosing from a set of alternatives with varying popularity by LanX (Saint) on Mar 28, 2022 at 01:23 UTC
> To favor the popular, you choose a random number N from 0 to 549, then traverse the set of animals, subtracting each one's popularity from N until N goes negative, and that's the animal you choose. I don't think that's a good idea. More than 80% of your random numbers will be bigger than 100, which means none will be picked. This also explains why you have problems inverting this "solution". Furthermore a linear search in the hash won't be truly random, because the hash order is fixed within the duration of a run. This will result in a disproportionate representation of the early entries. Solution is easy: `sort` the N entries and chose a modified random function which favors one end of the range, like `$i = int $N * rand() ** $e` and `$j = $N - 1 - $i` for index in your sorted list. The way you chose `$e > 1` will decide about the degree of bias ( `$e=1` means no bias) Just test the distribution of the indices: `DB<41> %h=();$e=3; $N=10; $h{ int $N * rand() **$e }++ for 1..10000 DB<42> x \%h 0 HASH(0x2f25fe8) 0 => 4670 1 => 1215 2 => 838 3 => 686 4 => 550 5 => 484 6 => 407 7 => 388 8 => 426 9 => 336 DB<43>` [download] In this example the first 10% are more than 10 times more likely to be picked than the last 10%. Of course YMMV about the distribution you prefer, but the basic idea is the same. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^2: Randomly choosing from a set of alternatives with varying popularity by hv (Prior) on Mar 28, 2022 at 03:35 UTC
I suspect you've misunderstood the 'popular' algorithm: 550 is the sum of the votes, so if (for example) the hash is traversed in order of decreasing popularity, rolling 0..99 will give alligator, 100..179 will give bear, etc., until 540..549 gives jellyfish. As far as I know this is pretty much the standard approach (up to optimizations) for picking from a set with weights.	[reply]
Re^3: Randomly choosing from a set of alternatives with varying popularity by LanX (Saint) on Mar 28, 2022 at 08:59 UTC
> rolling 0..99 will give alligator, 100..179 will give bear, etc., until 540..549 gives jellyfish. Indeed, I missed the fact that N is an accumulated sum where all votes so far are subtracted. That wasn't obvious from the description and the code wasn't concise. > As far as I know this is pretty much the standard approach I'm curious to know how my other objection is handled. The order in a Perl hash is random, but fixed during a run. So this algorithm will be biased to some results. Do you have a link to this "standard approach" discussing the distribution of picks? My guess is the list must be sorted in ascending order. Or to be more precise that the choice over ascending or descending decides over popular vs unpopular. > 0..99 will give alligator, 100..179 will give bear, etc., until 540..549 gives jellyfish. FWIW, the results you give can only be reproduced with a descending sort. > > > `{ alligator => 100, bear => 90, cat => 80, ... jellyfish => 10 }` (well almost `s/179/189/` ) NB: the OP has uncommented the `sort` in his code. > > > `# sort { ($votes{$b} <=> $votes{$a}) } #unnecessary` Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^4: Randomly choosing from a set of alternatives with varying popularity by LanX (Saint) on Mar 28, 2022 at 10:16 UTC
Re^5: Randomly choosing from a set of alternatives with varying popularity by ibm1620 (Hermit) on Mar 28, 2022 at 13:45 UTC
Some notes below your chosen depth have not been shown here


Problems? Is your data what you think it is?
	PerlMonks