Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: Random Derangement Of An Array

by blokhead (Monsignor)
on Jul 06, 2008 at 10:58 UTC ( #695805=note: print w/replies, xml ) Need Help??

in reply to Random Derangement Of An Array

Here's the approach I wanted to code up, but I've been sufficiently distracted.

The number of derangements is d(n) = (n-1)( d(n-1) + d(n-2) ), and there is a combinatorial proof of this at the wikipedia article. That is, there are (n-1) ways to build an n-derangement out of a (n-1)-derangement, and (n-1) ways to build an n-derangement out of a (n-2)-derangment. Furthermore, these correspond uniquely to all the ways to build an n-derangement.

So here is a way to randomly generate an n-derangement:

  • Recursively generate either a (n-1)-derangement or (n-2)-derangement, with probabilities relative to d(n-1) and d(n-2). The base cases are d(1)=0 (no ways to generate a 1-derangement) and d(2)=1 (only 1 choice for a 2-derangement).
  • Randomly pick one of the (n-1) ways to generate an n-derangement from the one you have, according to the combinatorial proof.
Unfortunately, I don't have any time to code this up, and the combinatorial proof is not written well. For the case of making an n-derangement out of an (n-1)-derangement, you simply add n to the end, and then swap the last position and a randomly chosen other position. I couldn't quite understand exactly what the wikipedia article was getting at for the other case, though, and it's been too long since I've thought of such things.

Maybe some curious monk can work this into usable code. But this approach is basically recursive: take a random starting derangement and choose a random way to augment it into a larger one. The end result will certainly be randomly distributed, provided you handle the (n-1) and (n-2) cases with the appropriate relative probabilities..

All I managed to get into code was a simple routine to count d(n). It uses a different combinatorial identity that is more amenable to simple computation. It seems like you'd need this to get the relative probabilities for the (n-1) and (n-2) cases to work out:

sub num_d { my ($n) = @_; return 1-$n if $n < 2; my $d = 0; $d += (-1)**$_ + ($_-1)*$d for 2 .. $n; return $d; }


Replies are listed 'Best First'.
Re^2: Random Derangement Of An Array
by dcturner (Initiate) on Dec 04, 2008 at 19:54 UTC
    Here's the approach I wanted to code up, but I've been sufficiently distracted.

    After much brain-racking I have been unable to come up with a less silly approach than simply encoding the combinatorial proof that d_(n+1) = n (d_n + d_(n-1)) in Perl, which was essentially your idea.

    All of the other ideas in this thread so far produce a biased distribution; the 'obvious' modification to the Fisher-Yates shuffle algorithm looked promising but misses some derangements entirely.

    It might be straightforward to do it purely iteratively too, but my brain works recursively so my programs do too!

    #!/usr/bin/perl use strict; sub d_l_rec { # Calculates the pair (d_n, d_{n-1}) # # Why calculate the pair? It's O(n) instead of O(2^n) # for the 'obvious' recursive algorithm. # # Why recursive? No need to do it iteratively. my ($n) = @_; return 1 if $n < 1; return (0, 1) if $n == 1; my ($d1, $d2) = d_l_rec($n-1); my $d = ($n-1) * ($d1 + $d2); return ($d, $d1); } sub random_local_derangement { # Returns a randomly-chosen local derangement of # (0..($n-1)). A local derangement is a derangement # *except* that the last place may be a fixed point. # # It's 'local' in the sense that, given $n people and a # hat of $n tickets you can generate a local derangement # by each person in turn pulling tickets out of the hat # until they get one that isn't theirs, ensuring that # (local to their draw) the permutation looks like it's # going to be a derangement. This is how 'Secret Santa' # derangements often seem to be organised, and this # process sometimes leaves the last person with their # own ticket. # # Unfortunately the 'Secret Santa' approach definitely # does not give uniform probabilities to each outcome. # This function, on the other hand, does. [At least, # it's definitely uniform for $n <= 12 and merely a very # good approximation for larger $n.] my ($n) = @_; if ($n == 0) { return []; } my ($i, $threshold); # A local $n-derangement is either a full 'total' # $n-derangement or else it is a ($n-1)-derangement with # a fixed point added at the end. We must choose between # these options with appropriate probability weighting. # Note that this means that l_k = d_k + d_{k-1} where # l_k is the number of local k-derangements and d_k is # the number of total k-derangements. # # Note that d_{12} < 2^31 < d_{13} so we have to be # careful of overflow for $n > 13. Fortunately there's a # good approximation that can be used. if ($n <= 12) { # Calculate d_{$n} and d_{$n-1} and a random value # $i in [0, d_{$n} + d_{$n-1}) to decide which of # the two options to use. my ($dn, $dn1) = d_l_rec($n); $i = int(rand($dn+$dn1)); $threshold = $dn1; } else { # If $n is large then d_$n/d_{$n-1} is very close to # $n. Therefore it'll do to pick a random $i in the # range [0, $n] and see if it is 0 or not. $i = int(rand($n+1)); $threshold = 1; # I think this is ok, but it just might contain an # off-by-one error. The upshot of such an error # would be a degree of bias in the results that is # going to be hard to detect - you may have to run # it literally trillions of times to pick up a # statistically significant result. } if ($i < $threshold) { # Case 1 - pick a properly local derangement my $d = random_derangement($n-1); push @$d, $n-1; return $d; } else { # Case 2 - pick a total derangement my $d = random_derangement($n); return $d; } } sub random_derangement { # Returns a randomly-chosen (total) derangement of # (0..($n-1)), uniformly-chosen amongst all possible # derangements. my ($n) = @_; if ($n == 0) { return []; } # There are (n-1) l_{n-1} of them, so pick a (uniformly) # random local ($n-1)-derangement and a random $m in the # range [0, $n-1). my $ld = random_local_derangement($n-1); my $m = int(rand($n-1)); # If L_k is the set of all local k-derangements and D_k # is the set of all total k-derangements then the code # below encodes the proof that (n-1) l_{n-1} = d_n in a # bijection between [0, $n-1) x L_{n-1} and D_{n}. # # Since the pair ($m, $ld) are chosen uniformly, this # shows that the resulting derangement is also uniformly # chosen. if ($n-2 == $ld->[$n-2]) { # $ld is properly local. Therefore the desired # derangement swaps the $m'th and last places and # uses $ld to derange the other places. my $j = $n-1; while ($j--) { my $k = $j < $m ? $j : $j-1; $ld->[$j] = $ld->[$k] < $m ? $ld->[$k] : $ld->[$k]+1; } $ld->[$n-1] = $m; $ld->[$m] = $n-1; return $ld; } else { # $ld is total. Therefore put the $m'th entry at the # end and put $n-1 in the $m'th place. $ld->[$n-1] = $ld->[$m]; $ld->[$m] = $n-1; return $ld; } } sub check_derangement { # Check that we have really generated a derangement my ($n, $d) = @_; my $s = join ', ', @$d; die "Wrong length: $s ($n)" unless ($n == @$d); for (my $i = 0; $i < $n; $i++) { die "Not a derangement: $s" if ($i == $d->[$i]); die "Illegal value: $d->[$i] in $s" if ($d->[$i] < 0 || $d->[$i] >= $n); } eval { my @check_unique = sort { $a <=> $b || undef } @$d; }; die "Uniqueness check failed: $s" if $@; } my $n = $ARGV[0]; my %f = (); my $c = 0; for (my $i = 0; $i < 1e6; $i++) { my $d = random_derangement($n); check_derangement($n, $d); my $s = join ',', @$d; $f{$s} += 1; $c += 1; } for my $key (sort {$f{$a} <=> $f{$b}} keys %f) { printf "%s: %0.2f%% (expected %+0.2f%%)\n", $key, 100.0*($f{$key}/ +$c), 100.0*(($f{$key}/$c)-(1.0/(scalar keys %f))); } printf "Total %d (%d runs)\n", (scalar keys %f), $c;

    This produces output like the following

    $ ./ 4 2,3,0,1: 11.08% (expected -0.03%) 1,0,3,2: 11.09% (expected -0.02%) 2,3,1,0: 11.10% (expected -0.01%) 1,3,0,2: 11.11% (expected -0.01%) 2,0,3,1: 11.11% (expected -0.00%) 1,2,3,0: 11.12% (expected +0.01%) 3,0,1,2: 11.12% (expected +0.01%) 3,2,1,0: 11.13% (expected +0.02%) 3,2,0,1: 11.14% (expected +0.03%) Total 9 (1000000 runs)

    Clearly that's not far off uniform!

      Here is a recent article describing a simpler algorithm for sampling derangements. I also found slides for the presentation. Since the paper is so recent, I guess this means that a small modification of Fisher-Yates is unlikely to generate derangements, since someone would have already come up with it by now. Still, their algorithm is in-place and has better expected running time than retrying Fisher-Yates until you get a derangement.

      Here is a Perl implementation I whipped up. It is slightly odd because I followed their lead and used array indexing from 1.

      sub rand_derangement { my $n = shift; return if $n == 1; ## no derangements of size 1 ## precompute $D[n] == number of derangements of size n my @D = (1,0); push @D, $#D * ($D[-1] + $D[-2]) while $#D < $n; my @A = (undef, 1 .. $n); my @mark = (1, (0) x $n); my ($i, $u) = ($n, $n); while ($u > 1) { if (! $mark[$i]) { my $j = 0; $j = 1 + int rand($i-2) while $mark[$j]; @A[$i,$j] = @A[$j,$i]; if ( rand(1) < ($u-1) * $D[$u-2] / $D[$u] ) { $mark[$j] = 1; $u--; } $u--; } $i--; } return @A[1..$n]; }


        Here is a recent article describing a simpler algorithm for sampling derangements...

        Ah yes, that's nice. I think there is a slight buglet in the given code:

        $j = 1 + int rand($i-2) while $mark[$j];

        should read

        $j = 1 + int rand($i-1) while $mark[$j];

        should it not? Also it could hit an overflow bug for $n > 12, because d_n gets quite large quite quickly. Even with 64-bit ints you overflow for $n > 20. The same approximation works for your method as for mine: for $n > 12 the value (n-1)d_{n-2} / d_n is very close to 1/n.

        The iterative version of the recursive program that I wrote above is as follows. It's a bit convoluted because the recursion d_n = (n-1) (d_{n-1} + d_{n-2}) is second-order, so you have to work a bit harder than for a first-order recursion. It's also in-place and requires only a constant amount of auxiliary storage. Plus it certainly terminates, whereas the rejection method merely almost certainly terminates :) On the minus side, it is a bit quadratic (although with low probability). Side-by-side, they run pretty similarly it seems.

        sub random_derangement { my ($n) = @_; my @d = (1, 0, 1, 2, 9, 44, 265, 1854, 14833, 133496, 1334961, 14684570, 176214841); my @t; my $i = $n; while ($i--) { my $m = int(rand($i)); $t[$i] = $m; if ($i <= 12) { $i -= 1 if (int(rand($d[$i]+$d[$i-1])) < $d[$i-1]); } else { $i -= 1 if (int(rand($i+1)) == 0); } } for ($i = 0; $i < $n; $i++) { if (defined($t[$i])) { my $m = $t[$i]; $t[$i] = $t[$m]; $t[$m] = $i; } else { my $j = $i+1; my $m = $t[$i+1]; while ($j--) { my $k = $j < $m ? $j : $j-1; $t[$j] = $t[$k] < $m ? $t[$k] : $t[$k]+1; } $t[$i+1] = $m; $t[$m] = $i+1; $i += 1; } } return \@t; }

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://695805]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (7)
As of 2021-10-22 15:03 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (85 votes). Check out past polls.