Perl: the Markov chain saw PerlMonks

### Re^2: Random Derangement Of An Array

by dcturner (Initiate)
 on Dec 04, 2008 at 19:54 UTC ( #728077=note: print w/replies, xml ) Need Help??

in reply to Re: Random Derangement Of An Array
in thread Random Derangement Of An Array

Here's the approach I wanted to code up, but I've been sufficiently distracted.

After much brain-racking I have been unable to come up with a less silly approach than simply encoding the combinatorial proof that d_(n+1) = n (d_n + d_(n-1)) in Perl, which was essentially your idea.

All of the other ideas in this thread so far produce a biased distribution; the 'obvious' modification to the Fisher-Yates shuffle algorithm looked promising but misses some derangements entirely.

It might be straightforward to do it purely iteratively too, but my brain works recursively so my programs do too!

#!/usr/bin/perl use strict; sub d_l_rec { # Calculates the pair (d_n, d_{n-1}) # # Why calculate the pair? It's O(n) instead of O(2^n) # for the 'obvious' recursive algorithm. # # Why recursive? No need to do it iteratively. my (\$n) = @_; return 1 if \$n < 1; return (0, 1) if \$n == 1; my (\$d1, \$d2) = d_l_rec(\$n-1); my \$d = (\$n-1) * (\$d1 + \$d2); return (\$d, \$d1); } sub random_local_derangement { # Returns a randomly-chosen local derangement of # (0..(\$n-1)). A local derangement is a derangement # *except* that the last place may be a fixed point. # # It's 'local' in the sense that, given \$n people and a # hat of \$n tickets you can generate a local derangement # by each person in turn pulling tickets out of the hat # until they get one that isn't theirs, ensuring that # (local to their draw) the permutation looks like it's # going to be a derangement. This is how 'Secret Santa' # derangements often seem to be organised, and this # process sometimes leaves the last person with their # own ticket. # # Unfortunately the 'Secret Santa' approach definitely # does not give uniform probabilities to each outcome. # This function, on the other hand, does. [At least, # it's definitely uniform for \$n <= 12 and merely a very # good approximation for larger \$n.] my (\$n) = @_; if (\$n == 0) { return []; } my (\$i, \$threshold); # A local \$n-derangement is either a full 'total' # \$n-derangement or else it is a (\$n-1)-derangement with # a fixed point added at the end. We must choose between # these options with appropriate probability weighting. # Note that this means that l_k = d_k + d_{k-1} where # l_k is the number of local k-derangements and d_k is # the number of total k-derangements. # # Note that d_{12} < 2^31 < d_{13} so we have to be # careful of overflow for \$n > 13. Fortunately there's a # good approximation that can be used. if (\$n <= 12) { # Calculate d_{\$n} and d_{\$n-1} and a random value # \$i in [0, d_{\$n} + d_{\$n-1}) to decide which of # the two options to use. my (\$dn, \$dn1) = d_l_rec(\$n); \$i = int(rand(\$dn+\$dn1)); \$threshold = \$dn1; } else { # If \$n is large then d_\$n/d_{\$n-1} is very close to # \$n. Therefore it'll do to pick a random \$i in the # range [0, \$n] and see if it is 0 or not. \$i = int(rand(\$n+1)); \$threshold = 1; # I think this is ok, but it just might contain an # off-by-one error. The upshot of such an error # would be a degree of bias in the results that is # going to be hard to detect - you may have to run # it literally trillions of times to pick up a # statistically significant result. } if (\$i < \$threshold) { # Case 1 - pick a properly local derangement my \$d = random_derangement(\$n-1); push @\$d, \$n-1; return \$d; } else { # Case 2 - pick a total derangement my \$d = random_derangement(\$n); return \$d; } } sub random_derangement { # Returns a randomly-chosen (total) derangement of # (0..(\$n-1)), uniformly-chosen amongst all possible # derangements. my (\$n) = @_; if (\$n == 0) { return []; } # There are (n-1) l_{n-1} of them, so pick a (uniformly) # random local (\$n-1)-derangement and a random \$m in the # range [0, \$n-1). my \$ld = random_local_derangement(\$n-1); my \$m = int(rand(\$n-1)); # If L_k is the set of all local k-derangements and D_k # is the set of all total k-derangements then the code # below encodes the proof that (n-1) l_{n-1} = d_n in a # bijection between [0, \$n-1) x L_{n-1} and D_{n}. # # Since the pair (\$m, \$ld) are chosen uniformly, this # shows that the resulting derangement is also uniformly # chosen. if (\$n-2 == \$ld->[\$n-2]) { # \$ld is properly local. Therefore the desired # derangement swaps the \$m'th and last places and # uses \$ld to derange the other places. my \$j = \$n-1; while (\$j--) { my \$k = \$j < \$m ? \$j : \$j-1; \$ld->[\$j] = \$ld->[\$k] < \$m ? \$ld->[\$k] : \$ld->[\$k]+1; } \$ld->[\$n-1] = \$m; \$ld->[\$m] = \$n-1; return \$ld; } else { # \$ld is total. Therefore put the \$m'th entry at the # end and put \$n-1 in the \$m'th place. \$ld->[\$n-1] = \$ld->[\$m]; \$ld->[\$m] = \$n-1; return \$ld; } } sub check_derangement { # Check that we have really generated a derangement my (\$n, \$d) = @_; my \$s = join ', ', @\$d; die "Wrong length: \$s (\$n)" unless (\$n == @\$d); for (my \$i = 0; \$i < \$n; \$i++) { die "Not a derangement: \$s" if (\$i == \$d->[\$i]); die "Illegal value: \$d->[\$i] in \$s" if (\$d->[\$i] < 0 || \$d->[\$i] >= \$n); } eval { my @check_unique = sort { \$a <=> \$b || undef } @\$d; }; die "Uniqueness check failed: \$s" if \$@; } my \$n = \$ARGV[0]; my %f = (); my \$c = 0; for (my \$i = 0; \$i < 1e6; \$i++) { my \$d = random_derangement(\$n); check_derangement(\$n, \$d); my \$s = join ',', @\$d; \$f{\$s} += 1; \$c += 1; } for my \$key (sort {\$f{\$a} <=> \$f{\$b}} keys %f) { printf "%s: %0.2f%% (expected %+0.2f%%)\n", \$key, 100.0*(\$f{\$key}/ +\$c), 100.0*((\$f{\$key}/\$c)-(1.0/(scalar keys %f))); } printf "Total %d (%d runs)\n", (scalar keys %f), \$c;

This produces output like the following

\$ ./derangements.pl 4 2,3,0,1: 11.08% (expected -0.03%) 1,0,3,2: 11.09% (expected -0.02%) 2,3,1,0: 11.10% (expected -0.01%) 1,3,0,2: 11.11% (expected -0.01%) 2,0,3,1: 11.11% (expected -0.00%) 1,2,3,0: 11.12% (expected +0.01%) 3,0,1,2: 11.12% (expected +0.01%) 3,2,1,0: 11.13% (expected +0.02%) 3,2,0,1: 11.14% (expected +0.03%) Total 9 (1000000 runs)

Clearly that's not far off uniform!

Replies are listed 'Best First'.
Re^3: Random Derangement Of An Array
by blokhead (Monsignor) on Dec 07, 2008 at 20:38 UTC
Here is a recent article describing a simpler algorithm for sampling derangements. I also found slides for the presentation. Since the paper is so recent, I guess this means that a small modification of Fisher-Yates is unlikely to generate derangements, since someone would have already come up with it by now. Still, their algorithm is in-place and has better expected running time than retrying Fisher-Yates until you get a derangement.

Here is a Perl implementation I whipped up. It is slightly odd because I followed their lead and used array indexing from 1.

sub rand_derangement { my \$n = shift; return if \$n == 1; ## no derangements of size 1 ## precompute \$D[n] == number of derangements of size n my @D = (1,0); push @D, \$#D * (\$D[-1] + \$D[-2]) while \$#D < \$n; my @A = (undef, 1 .. \$n); my @mark = (1, (0) x \$n); my (\$i, \$u) = (\$n, \$n); while (\$u > 1) { if (! \$mark[\$i]) { my \$j = 0; \$j = 1 + int rand(\$i-2) while \$mark[\$j]; @A[\$i,\$j] = @A[\$j,\$i]; if ( rand(1) < (\$u-1) * \$D[\$u-2] / \$D[\$u] ) { \$mark[\$j] = 1; \$u--; } \$u--; } \$i--; } return @A[1..\$n]; }

Here is a recent article describing a simpler algorithm for sampling derangements...

Ah yes, that's nice. I think there is a slight buglet in the given code:

\$j = 1 + int rand(\$i-2) while \$mark[\$j];

\$j = 1 + int rand(\$i-1) while \$mark[\$j];

should it not? Also it could hit an overflow bug for \$n > 12, because d_n gets quite large quite quickly. Even with 64-bit ints you overflow for \$n > 20. The same approximation works for your method as for mine: for \$n > 12 the value (n-1)d_{n-2} / d_n is very close to 1/n.

The iterative version of the recursive program that I wrote above is as follows. It's a bit convoluted because the recursion d_n = (n-1) (d_{n-1} + d_{n-2}) is second-order, so you have to work a bit harder than for a first-order recursion. It's also in-place and requires only a constant amount of auxiliary storage. Plus it certainly terminates, whereas the rejection method merely almost certainly terminates :) On the minus side, it is a bit quadratic (although with low probability). Side-by-side, they run pretty similarly it seems.

sub random_derangement { my (\$n) = @_; my @d = (1, 0, 1, 2, 9, 44, 265, 1854, 14833, 133496, 1334961, 14684570, 176214841); my @t; my \$i = \$n; while (\$i--) { my \$m = int(rand(\$i)); \$t[\$i] = \$m; if (\$i <= 12) { \$i -= 1 if (int(rand(\$d[\$i]+\$d[\$i-1])) < \$d[\$i-1]); } else { \$i -= 1 if (int(rand(\$i+1)) == 0); } } for (\$i = 0; \$i < \$n; \$i++) { if (defined(\$t[\$i])) { my \$m = \$t[\$i]; \$t[\$i] = \$t[\$m]; \$t[\$m] = \$i; } else { my \$j = \$i+1; my \$m = \$t[\$i+1]; while (\$j--) { my \$k = \$j < \$m ? \$j : \$j-1; \$t[\$j] = \$t[\$k] < \$m ? \$t[\$k] : \$t[\$k]+1; } \$t[\$i+1] = \$m; \$t[\$m] = \$i+1; \$i += 1; } } return \@t; }

Create A New User
Node Status?
node history
Node Type: note [id://728077]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2021-01-22 07:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
The STEM quote I most wish I'd made is:

Results (238 votes). Check out past polls.

Notices?