Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Is this a fair shuffle?

by Roy Johnson (Monsignor)
on Apr 01, 2005 at 03:08 UTC ( [id://444070]=perlquestion: print w/replies, xml ) Need Help??

Roy Johnson has asked for the wisdom of the Perl Monks concerning the following question:

sort {rand 1 <=> .5} (1..20) # Update: was using {-1 + int rand 3}
Actually, I suspect it's not, just by running it a bunch of times. 20 ends up at or near the tail end most of the time, though it gets more apparently random if I use sort '_quicksort'.

Really, I'd like a succinct explanation of why it's not.

Update: with the changed random -- not having it return zeroes -- it looks somewhat more random.


Caution: Contents may have been coded under pressure.

Replies are listed 'Best First'.
Re: Is this a fair shuffle?
by blokhead (Monsignor) on Apr 01, 2005 at 04:19 UTC
    From perldoc -f sort:
    The comparison function is required to behave. If it returns inconsistent results (sometimes saying $x[1] is less than $x[2] and sometimes saying the opposite, for example) the results are not well-defined.
    And "not well-defined" should never be interpreted as "uniformly random." Uniformity is an extremely specific statistical property that in general does not just happen naturally.

    Something closer to what you have in mind is probably the following Schwartzian Transform:

    map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, rand 1 ] } @array
    This is a fair shuffle. Each element is equally likely to have the smallest corresponding random value (and therefore be first). Then each remaining element is equally likely to be the smallest among the remaining elements, and so on. This is the definition of a uniformly chosen permutation.

    The key difference from your example is that we assign the random values first, then sort. So here, the comparisons are self-consistent.

    Update: expanding more on the definition of a "well-behaved" comparison function: The text of the POD is slightly misleading. While sorting a single list, the sorting algorithm never needs to compare the same pair of elements twice. What "well-behaved" really means is that all the comparisons made indeed correspond to a linear ordering of the elements. For instance, when sorting (A,B,C), the random coin-flips might work out so that the comparisons return A<B, B<C, and C<A. This is not a linear order (not even a partial order), and this is exactly when the sorting algorithm's behavior is undefined.

    To be clear about the distinction between the two examples: In my ST example, all the randomness is fixed before the sorting algorithm is called. You are just sorting a list of random numbers numerically, and the numbers don't change in the middle of the sort, so there is no problem at all.

    blokhead

Re: Is this a fair shuffle?
by BrowserUk (Patriarch) on Apr 01, 2005 at 04:23 UTC

    No. For the proof, see Re: When the Best Solution Isn't.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco.
    Rule 1 has a caveat! -- Who broke the cabal?
      I guess I should have suspected that it had been done before. Thanks for the link. Good benchmark there. Incidentally, if you change the qshuf sub to be
      sub qshuf { sort { .5 <=> rand(1) } sort { .5 <=> rand(1) } sort { .5 <=> rand(1) } sort { .5 <=> rand(1) } @_; }
      The Std Dev drops to competitive levels, but the performance drops to last. Two or three calls to sort isn't sufficient. It was surprising to me that the time required for two chained sorts is so much more than for one.

      Caution: Contents may have been coded under pressure.

        I often wondered how using a random "constant" would change things but I never got around to trying it. It would also be good to add List::Util::shuffle to the benchmark. I assume it would beat my perl implementation hands down for small arrays, but mine is probably still quicker for large arrays because it operates in place?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco.
        Rule 1 has a caveat! -- Who broke the cabal?
Re: Is this a fair shuffle?
by dragonchild (Archbishop) on Apr 01, 2005 at 03:49 UTC
      Sometimes it's about understanding how things work and relate, rather than just being handed a tool that is made for the job. In this case, I was thinking (rightly or wrongly) about how pretty much all array functions are combine or reduce operations. Then I thought about sorting and shuffling and thought that they were inverses of each other, operating on the order of an array, so it seemed like sort should work to shuffle. If it doesn't, I want to know why not, so that I can adjust my thinking accordingly.

      Plus, I think it would be cool to have a built-in for shuffling. It even reads well if you want to make a randomizing sub: sort randomly LIST.


      Caution: Contents may have been coded under pressure.
        if you put it in this mathematical inverse sense, i would say that since your imaginary function of random->sorted is not one-to-one, so it can not have an inverse (ie, every input:random array has exactly one output:sorted array, but the inverse one input:sorted array could have many output:random arrays).

        this is just for thinking's sake, it definitely doesn't give you any answer
Re: Is this a fair shuffle?
by chas (Priest) on Apr 01, 2005 at 04:14 UTC
    Since the block doesn't reference $a and $b, I'm mystified about what effect it has. Can you explain? (maybe this will be the thing I learn today! I'm not doubting it...just don't understand.)
    chas
      Sort operates by doing pairwise comparisons of elements, applying the comparison function to them. The results of the function tell it how to order the two elements relative to each other. In this case, the results are random, independent of the values being compared, so it sorts into random order.

      Caution: Contents may have been coded under pressure.
        Thanks for both replies! I see now; somehow I was fixated on the fact that the sorting sub or block is supposed to use the values $a, $b, but for a "random" sort, the whole point is that the actual values of $a, $b in each case are supposed to have no (deterministic) effect on the resulting order.
        (By the way, to get a random order, my method would likely be to use rand (in some way) to select one of the 20 numbers, then use it again to select one of the remaining 19, etc. That may be essentially what someone has described in one of the replies already.)
        chas
      $a and $b are getting values like 1 and 2 and then 3 and 4, pairs from the list.

      The goal was to shuffle the numbers into a random order, so you don't necessarily want to use the values in the 1..20 range for a comparison.

      Instead, the code is generating a random number and comparing it with .5, for a coin-toss-like chance of one thing being sorted above or below the other thing.

Re: Is this a fair shuffle?
by shemp (Deacon) on Apr 01, 2005 at 21:45 UTC
    I think i can address why this shuffle isnt fair. (sort of)

    To simplify things, i am going to pretend that the built in sort uses a bubble sort (i honestly dont know what algorithm is used behind the scenes), but similar arguments can be applied to whatever sort() really uses.

    So, recall that the 'largest' element will be bubbled to the last element in the array after 1 iteration, the second largest after the second iteration, and so on.

    For this example, regardless of what happens prior to the final 2 elements being compared on the first iteration, when the final 2 elements are compared, there is a 50-50 chance that the final element will stay in place - provided rand() is fair.

    If the final element is bubbled back, then on the second iteration it has a 50-50 chance of staying in the second last position. And after the 50-50 of staying in the final position, that makes a 25% chance of staying in the second last place. And so on and so on.

    The chance of various other elements ending up in various places could be similarly calculated.


    That said, recall that i assumed a bubble sort. I leave it as an exercise to the reader to perform similar analysis on other sort algorithms

    (how cocky of me!)

      It's certainly true that the sort algorithm used would affect the fairness. For bubblesort, it would seem that the random probability of a swap would need to be adjusted. Possibly (N-1)/N would be right, or possibly it would have to change with each pass.

      But Perl uses mergesort, which doesn't find an element's place and then never look at it again. Or you can select quicksort, which actually does take an element and stick it somewhere in the middle, after which it never moves again.


      Caution: Contents may have been coded under pressure.
Re: Is this a fair shuffle?
by Thargor (Scribe) on Apr 01, 2005 at 18:16 UTC
    I have only been trying to learn perl for about 2 months but couldn't you just iterate through your array and generate a random number between 0-19 then give your current array address the value from the random array address. so something like
    my $random = int( rand(20)); foreach $array (@array){ $store = $array; $array = $array[$random]; $array[$random] = $store; $random = int( rand(20)); }
    The code is probably wrong because I am a noob with perl and programing in general but that is my idea, and I thought this post was extremely interesting at least considering my level of knowledge about perl.
      I appreciate your trying to be helpful, but I wasn't actually looking for a good way to do a shuffle (that problem has been well solved by List::Util 'shuffle'). I was wondering whether a novel (or what I thought was a novel) way of shuffling was fundamentally flawed. (It was.)

      For more on how to do shuffles right and wrong, see A bad shuffle. And welcome to the wonderful world of Perl.


      Caution: Contents may have been coded under pressure.

      Along the lines you mentioned, this works:

      my @shuffled = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, rand() ] } @unshuffled;

      But compared to other approaches it is slow and memory-expensive.

      This topic came up at a recent kw.pm meeting, and I played around and did some rudimentary benchmarks:

      http://kw.pm.org/wiki/index.cgi?ListShuffle

      2005-04-02 Janitored by Arunbear - replaced pre tags with code tags, to allow code extraction

        Hey even though I didn't start this question it is really interesting to me. Could you explain, point to somewhere I can read about it, or otherwise instruct me as to why my solution is slow. I see lots of posts with map and such in them and I have no clue about its function so I assume it will come up later in my reading of the llama book. And thankyou to any who take the time to give me some information on this subject.

        Sorry, I don't post here often and didn't preview as well as I should have. That should obviously have been:

        my @shuffled = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, rand() ] } @unshuffled;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://444070]
Approved by moot
Front-paged by sweetblood
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-04-25 12:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found