Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Randomly choosing from a set of alternatives with varying popularity

by ibm1620 (Hermit)
on Mar 28, 2022 at 00:47 UTC ( [id://11142436]=perlquestion: print w/replies, xml ) Need Help??

ibm1620 has asked for the wisdom of the Perl Monks concerning the following question:

I have a hash that maps a set of things to their "popularity". In this example, there are ten animals, each of which has received a certain number of votes.
{ alligator => 100, bear => 90, cat => 80, ... jellyfish => 10 } total_votes = 550
I want to be able to randomly select from this set with three different biases: favoring the popular, impartial, and favoring the unpopular.

The impartial case is simple; choose the Nth item where 0 <= N <= 9.

To favor the popular, you choose a random number N from 0 to 549, then traverse the set of animals, subtracting each one's popularity from N until N goes negative, and that's the animal you choose.

But I can't figure out how to favor those with the least popularity. (Admittedly, this is more of an algorithm question than a Perl question, and I'll gladly take it to a different forum if it's inappropriate for PM.)

Here's my demo code;

#!/usr/bin/env perl use 5.010; use warnings; use strict; # Initialize %votes: map animals to their number of votes. my @animals = qw/alligator bear cat dog elephant fox giraffe hippo iguana jell +yfish/; my $votes = 100; # start with 100 votes for first word my $decr = int( $votes / (@animals) ); # evenly decrease # of votes my $sum = 0; # grand total of votes for all words my %votes = map {my $n = $votes; $sum += $n; $votes -= $decr; ( $_, $n )} @animals; # Display animals and votes print '===> '; printf "%s:%3d ", $_, $votes{$_} for sort keys %votes; print " Total votes=$sum"; # Get random selections my %popular; my %impartial; my %unpopular; for ( 1 .. $sum ) { $popular{ popular() }++; # should tend toward the popular $impartial{ impartial() }++; # should be equally distributed $unpopular{ unpopular() }++; # should tend toward the unpopular } # Display distribution of votes from each method print "\nPOP: "; printf "%s:%3d ", $_, $popular{$_}//0 for sort keys %votes; print "\nIMP: "; printf "%s:%3d ", $_, $impartial{$_}//0 for sort keys %votes; print "\nUNP: "; printf "%s:%3d ", $_, $unpopular{$_}//0 for sort keys %votes; say ''; # Method 1: choose randomly with preference for higher vote-getters sub popular { my $rand = int( rand $sum ); for my $key ( # sort { ($votes{$b} <=> $votes{$a}) } #unnecessary keys %votes ) { return $key if ( $rand -= $votes{$key} ) <= 0; } die 'popular bug'; } # Method 2: choose randomly without regard for popularity sub impartial { return ( keys %votes )[ rand keys %votes ]; } # Method 3: choose randomly with preference for lower vote-getters sub unpopular { return '?'; }

Replies are listed 'Best First'.
Re: Randomly choosing from a set of alternatives with varying popularity
by hv (Prior) on Mar 28, 2022 at 03:28 UTC

    One approach for favouring the unpopular answers would be to weight by the reciprocal of the votes, so jellyfish would be 10 times likelier to show up than alligator. Another would be to replace $v votes by $max + $min - $v (which would have the same jellyfish to alligator ratio, but would weight the middle choices differently).

    Note that the reciprocal approach can use the same algorithm as 'popular()' if you remove the unnecessary int(...) from its first line. The min/max approach would work with or without that change.

      > weight by the reciprocal of the votes

      FWIW, here my attempts with different reciprocal functions and evaluation of distribution. See also Re^3: Randomly choosing from a set of alternatives with varying popularity for interpretation.

      use strict; use warnings; use Data::Dump qw/pp dd/; my %vote = ( alligator => 100, bear => 90, cat => 80, jellyfish => 10 +); my $N =0; $N += $_ for values %vote; # --- invert by division, proportion of non-voters my $N2 = 0; my %vote2 = map { my $v2 = 1/$vote{$_}; $N2 += $v2; $_ => $v2 } keys %vote; # --- invert by subtraction, number of non-voters my $N3 = 0; my %vote3 = map { my $v3 = $N-$vote{$_}; $N3 += $v3; $_ => $v3 } keys %vote; my (%dist1,%dist2,%dist3); my $e=5; $dist1{pick($N,\%vote)}++ for 1..10**$e; $dist2{pick($N2,\%vote2)}++ for 1..10**$e; $dist3{pick($N3,\%vote3)}++ for 1..10**$e; pp \%dist1,\%dist2,\%dist3; sub pick { my ($N,$h_vote)=@_; my @deb; my $r =rand($N); push @deb,$r; scalar keys %$h_vote; # reset each while ( my ($k,$v) = each %$h_vote) { $r -= $v; push @deb,[$r,$k,$v]; return $k if $r <=0; } die "ERROR", pp \@deb; }
      OUTPUT:
      ( { alligator => 35891, bear => 31807, cat => 28792, jellyfish => 3510 + }, { alligator => 7497, bear => 8300, cat => 9481, jellyfish => 74722 } +, { alligator => 21416, bear => 22542, cat => 23870, jellyfish => 3217 +2 }, )

      On a tangent: had to debug why each failed me. Needed reset.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        The reciprocal approach definitely gives me what I had in mind, more so than computing the non-voters. That's what I'll use. Thanks.

        (Incidentally, should your comment above read "invert by division, proportion of voters", or am I missing something?)

      Thank you - I had a hunch reciprocals might be the way to reverse popularity, but wasn't sure the 'popular()' algorithm would work with them. (Was never entirely comfortable with floating point arithmetic :-)

        Well yeah, me neither: you know where you are with integers. :)

        If it reaches the stage that the maths becomes important, you would certainly want to take a lot more care. One way to do that would be to avoid the simplistic "walk a list of (floating-point) weights" algorithm, and instead work to make each reciprocal exact: for each of the options in turn, give it an exact 1/votes chance to get chosen, and if it fails proceed to the next. On reaching the end of the list without a choice, start again (which doesn't necessarily have to be in the same order).

        Getting an exact "1/votes" chance requires understanding a specific invariant of the RNG, the number of bits that it generates - exposed by perl via use Config; print $Config{randbits}. When something has 3 votes, for example, an RNG that generates 16 bits will slightly favour the value 0 (mod 3) by the ratio 21846:21845:21845. So for each option you would calculate $valid_lim = $votes * int(2**RNGbits / $votes) == 65535 in advance, and then re-roll any random value while ($rand = int(rand(2**$RNGbits))) >= $valid_lim before doing the modulus check choose_me() if ($rand % $votes) == 0, to remove that bias.

        That means doing more work (and using up more of the available entropy), so you won't want to do this unless the vote requires this sort of critical accuracy.

Re: Randomly choosing from a set of alternatives with varying popularity
by jwkrahn (Abbot) on Mar 28, 2022 at 02:55 UTC

    I'm sure that there is a better way to do this, but just off the top of my head:

    use List::Util qw/ shuffle max /; my %animals = ( alligator => 100, bear => 90, cat => 80, jellyfish => +10 ); my @popular_list = shuffle map { ( $_ ) x $animals{ $_ } } keys %anima +ls; my $popular_pick = $popular_list[ rand @popular_list ]; my @impartial_list = shuffle keys %animals; my $impartial_pick = $impartial_list[ rand @impartial_list ]; my $max_val = 1 + max values %animals; my @unpopular_list = shuffle map { ( $_ ) x ( $max_val - $animals{ $_ +} ) } keys %animals; my $unpopular_pick = $unpopular_list[ rand @unpopular_list ];
Re: Randomly choosing from a set of alternatives with varying popularity
by bliako (Monsignor) on Mar 28, 2022 at 20:42 UTC

    what you describe, in general, falls within the: "drawing random samples from a random distribution (RD)". Except that you additionally want to favour certain outcomes more. That can be achieved by transforming your RD. e.g. skew it, squash it, lift the tails, etc., prior to drawing.

    Drawing samples from an RD requires you to build the cumulative RD and then land randomly on any 'y'. Corresponding 'x' will be the random sample you need. You have a discrete RD, a histogram. So, build a cumulative histogram, transform the histogram to give better odds to unpopular items and then draw the sample.

    I would keep the 2 tasks separately.

    3day edit: in the 2nd paragraph swapped 'x' and 'y'. So basically, draw a random number from 0 to 1 this will be the 'y', the spot on the y-axis. The corresponding 'x' will be a random sample wrt this RD.

    bw, bliako

Re: Randomly choosing from a set of alternatives with varying popularity
by salva (Canon) on Mar 29, 2022 at 07:48 UTC
    But I can't figure out how to favor those with the least popularity

    There are several ways to do that. Which one is the most appropriate is very dependent on your final problem.

    Anyway, I have not seen anybody propose an exponential solution, so let me do it. The idea is that every subject start with the same score, 1.0, for instance, and every time one of them gets a vote, you reduce its score multiplying it by a factor c such that 0 < c < 1.

    As in your particular case, you have already done a draw, you can calculate the current scores as $score{$k} = $c ** $votes{$k}.

    The only issue remaining is how to pick c. The key here is how much you want to penalize the popular ones.

      Another interesting idea. I modified LanX's code to try this for several different values of c:
      #!/usr/bin/env perl use 5.010; use strict; use warnings; use Data::Dump qw/pp dd/; my %vote = ( alligator => 100, bear => 90, cat => 80, jellyfish => 10 +); my $N =0; $N += $_ for values %vote; say "Vote"; show(\%vote, 1); # --- invert by exponentiation for my $C ( 0.9, 0.95, 0.96, 0.97, 0.98, 0.99) { my $N4=0; my %vote4 = map { my $v4 = $C ** $vote{$_}; $N4 += $v4; $_ => $v4; } keys %vote; my %dist4; my $e=5; $dist4{pick($N4,\%vote4)}++ for 1..10**$e; say "C=$C"; show( \%dist4, -1); } sub show { # show counts and percentages my $href = shift; my $dir = shift; # sort direction 1=ascending, -1=descending my $tot=0; $tot += $_ for values %$href; for my $k (sort {$dir * ($a cmp $b)} keys %$href) { printf "%s => %5d (%5.2f%%) ", $k, $href->{$k}, 100*($href->{$k}/$tot); } say ''; } sub pick { my ($N,$h_vote)=@_; my @deb; my $r =rand($N); push @deb,$r; scalar keys %$h_vote; # reset each while ( my ($k,$v) = each %$h_vote) { $r -= $v; push @deb,[$r,$k,$v]; return $k if $r <=0; } die "ERROR", pp \@deb; }
      Output:
      Vote alligator => 100 (35.71%) bear => 90 (32.14%) cat => 80 (28. +57%) jellyfish => 10 ( 3.57%) C=0.9 jellyfish => 99902 (99.90%) cat => 65 ( 0.07%) bear => 24 ( 0. +02%) alligator => 9 ( 0.01%) C=0.95 jellyfish => 94748 (94.75%) cat => 2712 ( 2.71%) bear => 1599 ( 1. +60%) alligator => 941 ( 0.94%) C=0.96 jellyfish => 89102 (89.10%) cat => 5269 ( 5.27%) bear => 3440 ( 3. +44%) alligator => 2189 ( 2.19%) C=0.97 jellyfish => 78766 (78.77%) cat => 9190 ( 9.19%) bear => 6939 ( 6. +94%) alligator => 5105 ( 5.10%) C=0.98 jellyfish => 62428 (62.43%) cat => 15076 (15.08%) bear => 12415 (12. +41%) alligator => 10081 (10.08%) C=0.99 jellyfish => 42944 (42.94%) cat => 20843 (20.84%) bear => 18984 (18. +98%) alligator => 17229 (17.23%)
Re: Randomly choosing from a set of alternatives with varying popularity
by siberia-man (Friar) on Mar 29, 2022 at 09:13 UTC
    This question reminded me my old note I kept for case. It doesn't cover the actual topic but can be (or cannot be) considered as some hint.
    #!/usr/bin/env perl use strict; use warnings; use List::Util qw(max); # Lower weight means higher precedence my %weight_map = ( alligator => 100, bear => 90, cat => 80, jellyfish => 10, zebra => 1, ); my $weight_lowest = max(values %weight_map) + 1; my @z0 = qw/ bear wolf jellyfish zebra dog fox cat /; my @z1 = sort { $a cmp $b } @z0; my @z2 = sort { ( $weight_map{$a} // $weight_lowest ) <=> ( $weight_map{$b} // $weight_lowest ) or $a cmp $b; } @z0; print <<RESULT; unsorted : [@z0] sorted : [@z1] weighted : [@z2] RESULT
    And the execution results to:
    unsorted : [bear wolf jellyfish zebra dog fox cat] sorted : [bear cat dog fox jellyfish wolf zebra] weighted : [zebra jellyfish cat bear dog fox wolf]
Re: Randomly choosing from a set of alternatives with varying popularity
by LanX (Saint) on Mar 28, 2022 at 01:23 UTC
    > To favor the popular, you choose a random number N from 0 to 549, then traverse the set of animals, subtracting each one's popularity from N until N goes negative, and that's the animal you choose.

    I don't think that's a good idea. More than 80% of your random numbers will be bigger than 100, which means none will be picked.

    This also explains why you have problems inverting this "solution".

    Furthermore a linear search in the hash won't be truly random, because the hash order is fixed within the duration of a run. This will result in a disproportionate representation of the early entries.

    Solution is easy:

    sort the N entries and chose a modified random function which favors one end of the range, like $i = int $N * rand() ** $e and $j = $N - 1 - $i for index in your sorted list. The way you chose $e > 1 will decide about the degree of bias ( $e=1 means no bias)

    Just test the distribution of the indices:

    DB<41> %h=();$e=3; $N=10; $h{ int $N * rand() **$e }++ for 1..10000 DB<42> x \%h 0 HASH(0x2f25fe8) 0 => 4670 1 => 1215 2 => 838 3 => 686 4 => 550 5 => 484 6 => 407 7 => 388 8 => 426 9 => 336 DB<43>

    In this example the first 10% are more than 10 times more likely to be picked than the last 10%.

    Of course YMMV about the distribution you prefer, but the basic idea is the same.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      I suspect you've misunderstood the 'popular' algorithm: 550 is the sum of the votes, so if (for example) the hash is traversed in order of decreasing popularity, rolling 0..99 will give alligator, 100..179 will give bear, etc., until 540..549 gives jellyfish. As far as I know this is pretty much the standard approach (up to optimizations) for picking from a set with weights.

        > rolling 0..99 will give alligator, 100..179 will give bear, etc., until 540..549 gives jellyfish.

        Indeed, I missed the fact that N is an accumulated sum where all votes so far are subtracted. That wasn't obvious from the description and the code wasn't concise.

        > As far as I know this is pretty much the standard approach

        I'm curious to know how my other objection is handled. The order in a Perl hash is random, but fixed during a run. So this algorithm will be biased to some results.

        Do you have a link to this "standard approach" discussing the distribution of picks?

        My guess is the list must be sorted in ascending order. Or to be more precise that the choice over ascending or descending decides over popular vs unpopular.

        > 0..99 will give alligator, 100..179 will give bear, etc., until 540..549 gives jellyfish.

        FWIW, the results you give can only be reproduced with a descending sort.

        > > > { alligator => 100, bear => 90, cat => 80, ... jellyfish => 10 }

        (well almost s/179/189/ )

        NB: the OP has uncommented the sort in his code.

        > > > # sort { ($votes{$b} <=> $votes{$a}) } #unnecessary

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11142436]
Approved by LanX
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-26 00:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found