"MAKEUP!", er, I mean, "Did somebody say 'Benchmark'?":
use vars '@r';
sub supffl (\@) {
local *r = pop;
my $sz = @r;
my $i = -1;
while ($sz) {
$a = ++$i + rand $sz--;
@r[$i, $a] = @r[$a, $i];
}
}
=pod
$ supshfl.pl
Benchmark:running FAQ_FY, shuffl, supffl, each for at least 2 CPU sec
FAQ_FY: 2 wc secs(2.19 usr + 0.00 sys = 2.19 CPU) @ 19.63/s (n=43)
shuffl: 2 wc secs(2.13 usr + 0.00 sys = 2.13 CPU) @ 21.60/s (n=46)
supffl: 2 wc secs(2.16 usr + 0.00 sys = 2.16 CPU) @ 23.15/s (n=50)
Rate FAQ_FY shuffl supffl
FAQ_FY 19.6/s -- -9% -15%
shuffl 21.6/s 10% -- -7%
supffl 23.1/s 18% 7% --
permutation | FAQ_FY | shuffl | supffl
-------------------------------------------
-------------------------------------------
Std. Dev. | 6.404 | 6.397 | 5.746
=cut
Though 20% hardly seems worth it.
Basically, it's their efficiently checking for $i == $j which slows down FAQ_FY!    mumble, mumble . . . premature optimization . . . mumble, mumble . . .
sub FAQ_FY {
my $array = shift;
my $i;
for ($i = @$array; --$i; ) {
my $j = int rand ($i+1);
# next if $i == $j;
@$array[$i,$j] = @$array[$j,$i];
}
}
=pod
$ supshfl.pl
Benchmark:running FAQ_FY, shuffl, supffl, each for at least 1 CPU sec
FAQ_FY: 1 wc secs(1.05 usr + 0.00 sys = 1.05 CPU) @ 21.90/s (n=23)
shuffl: 1 wc secs(1.07 usr + 0.00 sys = 1.07 CPU) @ 21.50/s (n=23)
supffl: 1 wc secs(1.04 usr + 0.00 sys = 1.04 CPU) @ 23.08/s (n=24)
Rate shuffl FAQ_FY supffl
shuffl 21.5/s -- -2% -7%
FAQ_FY 21.9/s 2% -- -5%
supffl 23.1/s 7% 5% --
permutation | FAQ_FY | shuffl | supffl
--------------------------------------------
--------------------------------------------
Std. Dev. | 7.305 | 6.105 | 5.998
=cut
  p |