http://qs321.pair.com?node_id=210437


in reply to Re: random elements from array - new twist
in thread random elements from array - new twist

You have a bad shuffle. It does not give equal odds of all combinations coming up. It trivially cannot, if your array has n elements then all of your probabilities have to be multiples of 1/n**n which does not evenly divide n! - the number of possible permutations you are choosing between.

How to do it right is a FAQ.

  • Comment on Re: Re: random elements from array - new twist

Replies are listed 'Best First'.
Re: Re: Re: random elements from array - new twist
by BrowserUk (Pope) on Nov 05, 2002 at 18:23 UTC

    FWIW. To the best of my ability to test it, and walking on the shoulders of those better versed in statistics than I, my implementation of Fisher-Yates algorithm compares favourably with that given in the FAQ.

    Perhaps you would point out the flaws in my testing method?

    Test code

    C:\test>199981 -size=100 -iter=100000 Benchmark: running FAQ_FY, shuffl, each for at least 1 CPU seconds... FAQ_FY: 2 wallclock secs ( 1.14 usr + 0.00 sys = 1.14 CPU) @ 39 +2.64/s (n=448) shuffl: 1 wallclock secs ( 1.08 usr + 0.00 sys = 1.08 CPU) @ 44 +2.70/s (n=479) Rate FAQ_FY shuffl FAQ_FY 393/s -- -11% shuffl 443/s 13% -- permutation | FAQ_FY | shuffl ---------------------------------- A B C D: | 4140 | 4221 A B D C: | 4231 | 4211 A C B D: | 4205 | 4151 A C D B: | 4247 | 4257 A D B C: | 4076 | 4275 A D C B: | 4107 | 4244 B A C D: | 4207 | 4165 B A D C: | 4159 | 4240 B C A D: | 3941 | 4187 B C D A: | 4136 | 4116 B D A C: | 4158 | 4126 B D C A: | 4185 | 4111 C A B D: | 4126 | 4231 C A D B: | 4224 | 4122 C B A D: | 4135 | 4139 C B D A: | 4123 | 4097 C D A B: | 4179 | 4001 C D B A: | 4228 | 4175 D A B C: | 4295 | 4057 D A C B: | 4264 | 4131 D B A C: | 4206 | 4169 D B C A: | 4138 | 4239 D C A B: | 4167 | 4208 D C B A: | 4123 | 4127 ------------------------------------------------------ Std. Dev. | 72.382 | 67.643 C:\test>

    Nah! You're thinking of Simon Templar, originally played (on UKTV) by Roger Moore and later by Ian Ogilvy
      Your algorithm is not a Fisher-Yates shuffle. shuffl chooses a random number between 1 and the number of elements (inclusive) each time. A Fisher-Yates shuffle chooses from a smaller set each iteration to avoid performing a biased shuffle.

      Using the dataset A, B, C and D there are 24 possible outcomes (4!), each one with an equal probability of 1/24. The algorithm used for sub shuffl selects 4 random numbers each with a possible value of 1, 2, 3 or 4, or a total of 44 (256) possible outcomes, each with a probability of 1/256 (some of the final permutations are duplicated). 256 is not evenly divisible by 24, so some of the outcomes must be more likely than others.

      Update: Oops. BrowserUK's shuffle is indeed a Fisher-Yates shuffle. $a = $_ + rand @{$r} - $_ is parsed as $a = $_ + (rand @{$r} - $_). I assumed it would be parsed as $a = $_ + (rand @{$r}) - $_

        Look again!

        Update: Maybe this will clarify things a little?

        #! perl -sw use strict; sub shuffl (\@) { my $r=pop; for (0..$#{$r}) { my $size = scalar @{$r}; my $choices = $size - $_; printf 'iteration: %2d Size: %2d rand(%2d) range: %2d .. % +2d %s', $_, $size, $choices, $_, $cho +ices + $_, $/; $a = $_ + rand @{$r} - $_; @$r[$_, $a] = @$r[$a, $_]; } } my @array = (1 ..15); shuffl @array; __END__ C:\test>210557 iteration: 0 Size: 15 rand(15) range: 0 .. 15 iteration: 1 Size: 15 rand(14) range: 1 .. 15 iteration: 2 Size: 15 rand(13) range: 2 .. 15 iteration: 3 Size: 15 rand(12) range: 3 .. 15 iteration: 4 Size: 15 rand(11) range: 4 .. 15 iteration: 5 Size: 15 rand(10) range: 5 .. 15 iteration: 6 Size: 15 rand( 9) range: 6 .. 15 iteration: 7 Size: 15 rand( 8) range: 7 .. 15 iteration: 8 Size: 15 rand( 7) range: 8 .. 15 iteration: 9 Size: 15 rand( 6) range: 9 .. 15 iteration: 10 Size: 15 rand( 5) range: 10 .. 15 iteration: 11 Size: 15 rand( 4) range: 11 .. 15 iteration: 12 Size: 15 rand( 3) range: 12 .. 15 iteration: 13 Size: 15 rand( 2) range: 13 .. 15 iteration: 14 Size: 15 rand( 1) range: 14 .. 15 C:\test>

        Nah! You're thinking of Simon Templar, originally played (on UKTV) by Roger Moore and later by Ian Ogilvy
      "MAKEUP!", er, I mean, "Did somebody say 'Benchmark'?":
      use vars '@r'; sub supffl (\@) { local *r = pop; my $sz = @r; my $i = -1; while ($sz) { $a = ++$i + rand $sz--; @r[$i, $a] = @r[$a, $i]; } } =pod $ supshfl.pl Benchmark:running FAQ_FY, shuffl, supffl, each for at least 2 CPU sec FAQ_FY: 2 wc secs(2.19 usr + 0.00 sys = 2.19 CPU) @ 19.63/s (n=43) shuffl: 2 wc secs(2.13 usr + 0.00 sys = 2.13 CPU) @ 21.60/s (n=46) supffl: 2 wc secs(2.16 usr + 0.00 sys = 2.16 CPU) @ 23.15/s (n=50) Rate FAQ_FY shuffl supffl FAQ_FY 19.6/s -- -9% -15% shuffl 21.6/s 10% -- -7% supffl 23.1/s 18% 7% -- permutation | FAQ_FY | shuffl | supffl ------------------------------------------- ------------------------------------------- Std. Dev. | 6.404 | 6.397 | 5.746 =cut
      Though 20% hardly seems worth it.

      Basically, it's their efficiently checking for  $i == $j which slows down FAQ_FY!    mumble, mumble . . . premature optimization . . . mumble, mumble . . .
      sub FAQ_FY { my $array = shift; my $i; for ($i = @$array; --$i; ) { my $j = int rand ($i+1); # next if $i == $j; @$array[$i,$j] = @$array[$j,$i]; } } =pod $ supshfl.pl Benchmark:running FAQ_FY, shuffl, supffl, each for at least 1 CPU sec FAQ_FY: 1 wc secs(1.05 usr + 0.00 sys = 1.05 CPU) @ 21.90/s (n=23) shuffl: 1 wc secs(1.07 usr + 0.00 sys = 1.07 CPU) @ 21.50/s (n=23) supffl: 1 wc secs(1.04 usr + 0.00 sys = 1.04 CPU) @ 23.08/s (n=24) Rate shuffl FAQ_FY supffl shuffl 21.5/s -- -2% -7% FAQ_FY 21.9/s 2% -- -5% supffl 23.1/s 7% 5% -- permutation | FAQ_FY | shuffl | supffl -------------------------------------------- -------------------------------------------- Std. Dev. | 7.305 | 6.105 | 5.998 =cut

        p
      You are right...and accidentally demonstrated why you should break things out a little more and put in parens. I am a decent Perl programmer, and that code is unmaintainable.

      Just because Perl lets you do something doesn't mean that you should.

        You have a good point. In truth, I wrote that function several months back when I first started exploring perl's syntax. When the question came up, I just grepped for the line and c&p'd it into the answer as a means of demonstrating the concept. I wasn't trying to sell or promote that particular implementation of shuffle.

        I am kind of embarrassed to have had to defend it.


        Nah! You're thinking of Simon Templar, originally played (on UKTV) by Roger Moore and later by Ian Ogilvy