Re: Is this a fair shuffle?
by blokhead (Monsignor) on Apr 01, 2005 at 04:19 UTC
|
map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [ $_, rand 1 ] } @array
This is a fair shuffle. Each element is equally likely to have the smallest corresponding random value (and therefore be first). Then each remaining element is equally likely to be the smallest among the remaining elements, and so on. This is the definition of a uniformly chosen permutation.
The key difference from your example is that we assign the random values first, then sort. So here, the comparisons are self-consistent.
Update: expanding more on the definition of a "well-behaved" comparison function: The text of the POD is slightly misleading. While sorting a single list, the sorting algorithm never needs to compare the same pair of elements twice. What "well-behaved" really means is that all the comparisons made indeed correspond to a linear ordering of the elements. For instance, when sorting (A,B,C), the random coin-flips might work out so that the comparisons return A<B, B<C, and C<A. This is not a linear order (not even a partial order), and this is exactly when the sorting algorithm's behavior is undefined.
To be clear about the distinction between the two examples: In my ST example, all the randomness is fixed before the sorting algorithm is called. You are just sorting a list of random numbers numerically, and the numbers don't change in the middle of the sort, so there is no problem at all.
| [reply] [d/l] [select] |
Re: Is this a fair shuffle?
by BrowserUk (Patriarch) on Apr 01, 2005 at 04:23 UTC
|
No. For the proof, see Re: When the Best Solution Isn't.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.
Rule 1 has a caveat! -- Who broke the cabal?
| [reply] [d/l] |
|
I guess I should have suspected that it had been done before. Thanks for the link. Good benchmark there. Incidentally, if you change the qshuf sub to be
sub qshuf {
sort { .5 <=> rand(1) }
sort { .5 <=> rand(1) }
sort { .5 <=> rand(1) }
sort { .5 <=> rand(1) } @_;
}
The Std Dev drops to competitive levels, but the performance drops to last. Two or three calls to sort isn't sufficient. It was surprising to me that the time required for two chained sorts is so much more than for one.
Caution: Contents may have been coded under pressure.
| [reply] [d/l] |
|
I often wondered how using a random "constant" would change things but I never got around to trying it. It would also be good to add List::Util::shuffle to the benchmark. I assume it would beat my perl implementation hands down for small arrays, but mine is probably still quicker for large arrays because it operates in place?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.
Rule 1 has a caveat! -- Who broke the cabal?
| [reply] |
|
|
|
|
|
|
Re: Is this a fair shuffle?
by dragonchild (Archbishop) on Apr 01, 2005 at 03:49 UTC
|
What is the current fascination with fair shuffles? Why not just use List::Util's shuffle()?
| [reply] |
|
| [reply] [d/l] |
|
if you put it in this mathematical inverse sense, i would say that since your imaginary function of random->sorted is not one-to-one, so it can not have an inverse (ie, every input:random array has exactly one output:sorted array, but the inverse one input:sorted array could have many output:random arrays).
this is just for thinking's sake, it definitely doesn't give you any answer
| [reply] |
Re: Is this a fair shuffle?
by chas (Priest) on Apr 01, 2005 at 04:14 UTC
|
Since the block doesn't reference $a and $b, I'm mystified
about what effect it has. Can you explain? (maybe this will
be the thing I learn today! I'm not doubting it...just
don't understand.)
chas | [reply] |
|
| [reply] |
|
Thanks for both replies! I see now; somehow I was fixated on the fact that the sorting sub or block is supposed to use the values $a, $b, but for a "random" sort, the whole point is that the actual values of $a, $b in each case are supposed to have no (deterministic) effect on the resulting order.
(By the way, to get a random order, my method would likely
be to use rand (in some way) to select one of the 20 numbers, then use it again to select one of the remaining 19, etc. That may be essentially what someone has described in one of the replies already.)
chas
| [reply] |
|
$a and $b are getting values like 1 and 2 and then 3 and 4, pairs from the list.
The goal was to shuffle the numbers into a random order, so you don't necessarily want to use the values in the 1..20 range for a comparison.
Instead, the code is generating a random number and comparing it with .5, for a coin-toss-like chance of one thing being sorted above or below the other thing.
| [reply] [d/l] [select] |
Re: Is this a fair shuffle?
by shemp (Deacon) on Apr 01, 2005 at 21:45 UTC
|
I think i can address why this shuffle isnt fair. (sort of)
To simplify things, i am going to pretend that the built in sort uses a bubble sort (i honestly dont know what algorithm is used behind the scenes), but similar arguments can be applied to whatever sort() really uses.
So, recall that the 'largest' element will be bubbled to the last element in the array after 1 iteration, the second largest after the second iteration, and so on.
For this example, regardless of what happens prior to the final 2 elements being compared on the first iteration, when the final 2 elements are compared, there is a 50-50 chance that the final element will stay in place - provided rand() is fair.
If the final element is bubbled back, then on the second iteration it has a 50-50 chance of staying in the second last position. And after the 50-50 of staying in the final position, that makes a 25% chance of staying in the second last place. And so on and so on.
The chance of various other elements ending up in various places could be similarly calculated.
That said, recall that i assumed a bubble sort. I leave it as an exercise to the reader to perform similar analysis on other sort algorithms
(how cocky of me!)
| [reply] |
|
It's certainly true that the sort algorithm used would affect the fairness. For bubblesort, it would seem that the random probability of a swap would need to be adjusted. Possibly (N-1)/N would be right, or possibly it would have to change with each pass.
But Perl uses mergesort, which doesn't find an element's place and then never look at it again. Or you can select quicksort, which actually does take an element and stick it somewhere in the middle, after which it never moves again.
Caution: Contents may have been coded under pressure.
| [reply] |
Re: Is this a fair shuffle?
by Thargor (Scribe) on Apr 01, 2005 at 18:16 UTC
|
I have only been trying to learn perl for about 2 months but couldn't you just iterate through your array and generate a random number between 0-19 then give your current array address the value from the random array address. so something like
my $random = int( rand(20));
foreach $array (@array){
$store = $array;
$array = $array[$random];
$array[$random] = $store;
$random = int( rand(20));
}
The code is probably wrong because I am a noob with perl and programing in general but that is my idea, and I thought this post was extremely interesting at least considering my level of knowledge about perl. | [reply] [d/l] |
|
| [reply] |
|
my @shuffled = map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [ $_, rand() ] }
@unshuffled;
But compared to other approaches it is slow and memory-expensive.
This topic came up at a recent kw.pm meeting, and I played around and did some rudimentary benchmarks:
http://kw.pm.org/wiki/index.cgi?ListShuffle
2005-04-02 Janitored by Arunbear - replaced pre tags with code tags, to allow code extraction | [reply] [d/l] |
|
Hey even though I didn't start this question it is really interesting to me. Could you explain, point to somewhere I can read about it, or otherwise instruct me as to why my solution is slow. I see lots of posts with map and such in them and I have no clue about its function so I assume it will come up later in my reading of the llama book. And thankyou to any who take the time to give me some information on this subject.
| [reply] |
|
|
my @shuffled = map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [ $_, rand() ] }
@unshuffled;
| [reply] [d/l] |