Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Fisher-Yates theory

by Abigail-II (Bishop)
on Jul 24, 2003 at 08:04 UTC ( [id://277468]=note: print w/replies, xml ) Need Help??


in reply to Re: Fisher-Yates theory
in thread Fisher-Yates theory

I'm not convinced that the Fisher-Yates shuffle actually has very much theory with regard to random order involved at all.

Well, you are wrong. The first publication of the Fisher-Yates shuffle dates from 1938. It's example 12 in the book Statistical Tables, whose authors are R.A. Fisher and F.Yates.

It was also discussed by R. Durstenfeld, in 1964. Communications of the ACM, volume 7, pp 420.

And of course Donald Knuth mentions it, in Volume 2 of The Art of Computer Programming. In the third edition, it's algorithm P, section 3.4.2 on page 145.

Abigail

Replies are listed 'Best First'.
Wow...
by RMGir (Prior) on Jul 24, 2003 at 13:29 UTC
    ++

    If there was ever a node that deserves to make the Best Nodes, that's it...

    I bookmarked it so I can read that article from ACM's digital library.

    UPDATE: I read the paragraph in the ACM digital library, for algorithm 235. How did you ever find that citation? It is the algorithm, but he doesn't credit Fisher-Yates...
    --
    Mike

      You have to ask Knuth that question. I got both the Fisher-Yates, and the ACM citation from Knuth.

      Abigail

        Thanks, but the answer would probably come with great details, and comments on the typesetting of both the book and the article :)
        --
        Mike
Fisher-Yates theory... does this prove that it is invalid?
by MarkM (Curate) on Jul 25, 2003 at 01:00 UTC

    UPDATE: As pointed out by others, I made an error when translating the code. See their summaries for good explanations. Cheers, and thanks everyone. (tail between legs)

    In a previous article, I was challenged for doubting the effectiveness of the Fisher-Yates shuffle as described in perlfaq.

    Below, I have written code that exhausts all possible random sequences that could be used during a particular Fisher-Yates shuffle. Statistically, this should be valid, as before the shuffle begins, there is an equal chance that the random sequence generated could be 0 0 0 0 0 as 0 1 2 3 4 as 4 4 4 4 4. By exhaustively executing the Fisher-Yates shuffle, and calculating the total number of occurrences that each result set is produced, we can determine whether the Fisher-Yates shuffle has the side effect of weighting the results, or whether the shuffle is truly random, in that it should be approximately well spread out.

    my $depth = 5; my %results; sub recurse { if (@_ >= $depth) { my @deck = (1 .. $depth); shuffle(\@deck, [@_]); $results{join('', @deck)}++; } else { recurse(@_, $_) for 0 .. ($depth-1); } } sub shuffle { my($deck, $rand) = @_; my $i = @$deck; while ($i--) { my $j = shift @$rand; @$deck[$i,$j] = @$deck[$j,$i]; } } recurse; for (sort {$results{$b} <=> $results{$a}} keys %results) { printf "%10d %s\n", $results{$_}, $_; }

    With the above code, I was able to determine that with a deck size of 5, and an initial set of 1 2 3 4 5, there is three times the probability that the resulting set will be 3 1 2 5 4 than the probability that the resulting set will be 2 3 4 5 1. To me, this indicates that this theory is flawed.

    If anybody needs to prove to themselves that the test is exhaustive, print out "@$rand" in the shuffle subroutine.

    Please analyze the code carefully, pull out your school books, and see if I have made a mistake.

    Cheers,
    mark

      Hi MarkM,

      That algorithm shows the possible results of a biased shuffle, not a Fisher-Yates shuffle. The random sequence generated would not be 00000 to 44444, it would be 0000 to 4321 (a five digit shuffle requires 4 iterations - the faq goes 5, but the last never swaps - with each iteration shuffling one less item).

      The while loop in shuffle needs one less iteration, and a minor adjustment to recurse would look like this:

      #!/usr/bin/perl use strict; use warnings; my $depth = 4; my %results; sub recurse { if (@_ == $depth) { shift; #discard $num my @deck = (1 .. $depth); shuffle(\@deck, [@_]); $results{join('', @deck)}++; } else { my $num = shift || $depth - 1; # one less element each iteration recurse($num, @_, $_) for 0 .. $num--; } } sub shuffle { my($deck, $rand) = @_; my $i = @$deck; # uncomment the following line # print "@$rand\n"; # pre-decrement $i instead of post - the last would be a no-op in +this case while (--$i) { my $j = shift @$rand; @$deck[$i,$j] = @$deck[$j,$i]; } } recurse; for (sort {$results{$b} <=> $results{$a}} keys %results) { printf "%10d %s\n", $results{$_}, $_; }

      Here are the results of the modifications, using 4 elements instead of 5 (only 24 possible permutations instead of 120 - makes the node much more readable ;):

      Each possible permutation is shown exactly one time, for a possibility of being selected 1 out of 24 times (assuming a perfect rng).

      Makes sense???

      Update: I followed BrowserUK's link below and in that thread there is a statement that elegantly describes the problem with a biased shuffle (When the Best Solution Isn't), by blakem: "It maps 8 paths to 6 end states". In this case, it's 3125 (5**5) paths to 120 (5!) end states - assuming 5 elements to be shuffled.

      The problem is, your shuffle routine is not an implementation of a Fisher-Yates shuffle.

      This line

      my $j = shift @$rand;

      is no way equivalent to this line from the FAQ implementation

      my $j = int rand ($i+1);

      The latter picks a swap partner for the current value of $i, randomly between 0 and $i-1. I can't quite wrap my brain around what your code is doing here, but it isn't even vaguely equivalent.

      Therefore you are not testing a Fisher-Yates shuffle, but some shuffle algorithm of your own invention, which you succeed in proving isn't as good as the Fisher-Yates.

      You might find this post Re: When the Best Solution Isn't that does a statistical analysis of several shuffle routines, a Fisher-Yates amongst them, including frequency and standard deviation interesting.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

      Please analyze the code carefully, pull out your school books, and see if I have made a mistake.

      Yes you have. You're not emulating a Fisher-Yates shuffle :-)

      Consider the original code from perlfaq:

      sub fisher_yates_shuffle { my $deck = shift; # $deck is a reference to an array my $i = @$deck; while ($i--) { print $i, "\n"; my $j = int rand ($i+1); @$deck[$i,$j] = @$deck[$j,$i]; } }

      Note how $i is decremented on each iteration. Consider how that alters the sequence of possible indices.

      Once you take that into account you get the textbook behaviour.

      sub fixed_fisher_yates_shuffle { my ($deck, $rand) = @_; my $i = @$deck; while ($i--) { my $j = shift @$rand; @$deck[$i,$j] = @$deck[$j,$i]; } } use Set::CrossProduct; my $i = Set::CrossProduct->new([ [0..4], [0..3], [0..2], [0..1], [0] ] +); my %count; while (my $a = $i->get) { print "@$a : "; my @foo = (1,2,3,4,5); fixed_fisher_yates_shuffle(\@foo, $a); print "@foo\n"; $count{"@foo"}++; }; foreach my $key (sort keys %count) { print $key, " = ", $count{$key}, "\n"; };
Re: Re: Fisher-Yates theory
by halley (Prior) on Jul 24, 2003 at 13:18 UTC
    Again with the "you are wrong." Why the acerbic and argumentative tone? Up to this point, this appears to be a friendly conversation about their views on the theory, which was informative and interesting.

    If someone says they're not convinced, you can't disprove that. One may heap on additional evidence to attempt to convince them, but they may continue to be unconvinced for rational or irrational reasons.

    This site often discusses matters of faith. There's no right or wrong about being convinced. "You are wrong" demarks an absolutism, both in passive grammar and in connotation. I prefer a community which demonstrates respect for others' faith and others' opinion.

    Fisher-Yates may have been proven effective by way of analyzing the results for uniform potential entropy. It may have been proven by logical analysis of the probabilities on each successive swap. Sharing that evidence is helpful, but I ask politely for everyone to build a constructive community, not one which promotes controversy and arrogance.

    --
    [ e d @ h a l l e y . c c ]

      Again with the "you are wrong." Why the acerbic and argumentative tone?

      For an opposite view: IMO we dont have enough people like Abigail-II kicking around the monastery. Yeah sure he's inclined to be a curmudgondy git with minimalist sense of humour, but hes also a prolific poster who knows the subject matter extremely well. (We have both kinds, but rarely both at the same time ifyswim) If putting up with a bit of crunchyness is what is needed to have Abigail-IIs wisdom and knowledge added to the Monestary trove then so be it.

      This site often discusses matters of faith. There's no right or wrong about being convinced.

      Sure we do discuss matters of faith here quite often. But I personally see a big difference between arguing about which editor is best and whether there is a solid theoretical foundation to what is a mathematical process. Abigail-II knows this stuff as well or better than anybody else here who can be bothered to post. He cites his sources, and he speaks his mind. I fail to see how you can fault him on it. (And ive been in his gunsights getting my shit sorted so dont think that I havent found his manner on occasion to be harsh. But bottom line Id believe what he says over just about anybody else here. I might not agree with him on all things but hes one of the people I come to read.)

      Sharing that evidence is helpful, but I ask politely for everyone to build a constructive community, not one which promotes controversy and arrogance.

      I think you should develop slightly thicker skin and not fight other peoples battles for them. If MarkM felt his reply was OTT then im sure he would say. (And that argument doesnt apply to this reply at all as you brought up the community and hence gave me the right to respond.)

      Cheers, (And I mean that honestly and friendly)


      ---
      demerphq

      <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://277468]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-04-25 17:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found