Re^2: About List::Util's pure Perl shuffle()

As long as we're (micro)optimizing, ...

I agree, that is a micro-optimisation :), but if you look at the bigger picture, your version is anything from 45% to 350% faster than the List::Util version--it's not so micro.

Plus, your solution inspired me to come up with this, I think, rather surprising version. It's not only faster again, sometimes as much as another 25% over the earlier gains, it's also arguably the most readable:

sub bukNew ($) {
    my( $ref ) = @_;
    my @x = 0 .. $#$ref;
    @{ $ref }[ map splice( @x, rand @x, 1 ), @x ];
}

10 strings length 1.25892541179417
            Rate    naive listutil blokhead      buk   bukNew
naive    22172/s       --     -41%     -69%     -69%     -73%
listutil 37867/s      71%       --     -47%     -47%     -54%
blokhead 71022/s     220%      88%       --      -1%     -14%
buk      72090/s     225%      90%       2%       --     -12%
bukNew   82136/s     270%     117%      16%      14%       --

10 strings length 10
            Rate    naive listutil blokhead      buk   bukNew
naive    20685/s       --     -44%     -70%     -71%     -75%
listutil 36824/s      78%       --     -47%     -49%     -55%
blokhead 70065/s     239%      90%       --      -3%     -14%
buk      72090/s     249%      96%       3%       --     -12%
bukNew   81717/s     295%     122%      17%      13%       --

10 strings length 1000
            Rate    naive listutil blokhead      buk   bukNew
naive     1193/s       --     -60%     -98%     -98%     -99%
listutil  2981/s     150%       --     -96%     -96%     -96%
blokhead 69931/s    5764%    2246%       --      -2%     -15%
buk      71723/s    5914%    2306%       3%       --     -12%
bukNew   81873/s    6765%    2647%      17%      14%       --

100 strings length 1.25892541179417
           Rate    naive listutil      buk blokhead   bukNew
naive    2190/s       --     -45%     -72%     -73%     -78%
listutil 3989/s      82%       --     -49%     -51%     -60%
buk      7848/s     258%      97%       --      -4%     -21%
blokhead 8144/s     272%     104%       4%       --     -18%
bukNew   9925/s     353%     149%      26%      22%       --

100 strings length 10
           Rate    naive listutil      buk blokhead   bukNew
naive    2092/s       --     -46%     -73%     -74%     -79%
listutil 3868/s      85%       --     -50%     -52%     -61%
buk      7811/s     273%     102%       --      -4%     -21%
blokhead 8094/s     287%     109%       4%       --     -18%
bukNew   9873/s     372%     155%      26%      22%       --

100 strings length 1000
           Rate    naive listutil      buk blokhead   bukNew
naive    96.3/s       --     -58%     -99%     -99%     -99%
listutil  229/s     138%       --     -97%     -97%     -98%
buk      7811/s    8009%    3314%       --      -4%     -21%
blokhead 8094/s    8303%    3437%       4%       --     -18%
bukNew   9869/s   10145%    4213%      26%      22%       --

1000 strings length 1.25892541179417
          Rate    naive listutil      buk blokhead   bukNew
naive    208/s       --     -41%     -74%     -75%     -76%
listutil 350/s      69%       --     -56%     -58%     -59%
buk      789/s     280%     125%       --      -5%      -7%
blokhead 827/s     298%     136%       5%       --      -3%
bukNew   850/s     310%     143%       8%       3%       --

1000 strings length 10
          Rate    naive listutil      buk blokhead   bukNew
naive    187/s       --     -42%     -76%     -77%     -78%
listutil 319/s      71%       --     -59%     -61%     -62%
buk      777/s     316%     143%       --      -5%      -7%
blokhead 815/s     337%     155%       5%       --      -2%
bukNew   834/s     347%     161%       7%       2%       --

1000 strings length 1000
           Rate    naive listutil      buk blokhead   bukNew
naive    8.47/s       --     -62%     -99%     -99%     -99%
listutil 22.2/s     162%       --     -97%     -97%     -97%
buk       785/s    9164%    3437%       --      -3%      -8%
blokhead  810/s    9470%    3554%       3%       --      -5%
bukNew    850/s    9939%    3733%       8%       5%       --
[download]

It's advantage tails off as the number of strings rise, but when you start seeing numbers like 20x, 30x and even 40x faster than the List::Util PP version, it seems possible that someone might benefit.

'sides, I think of optimising as a lot like obfu, a fun intellectual challange. Plus there is the upside that the results can be useful.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

[reply]
[d/l]

sub bukNew ($) { my( $ref ) = @_; my @x = 0 .. $#$ref; @{ $ref }[ map splice( @x, rand @x, 1 ), @x ]; }
[download]

More and more interesting. But then, since all the other subs are (@), why not simply the following?

sub bukNew (@) {
    my @x = 0 .. $#_;
    @_[ map splice( @x, rand @x, 1 ), @x ];
}
[download]

However some random tests with various string and list lengths seem to show that it could be considerably slower than buk(), by even as much as 40% or so. Thus... are you sure your benchmark is not flawed?

[reply]
[d/l]
[select]

More and more interesting. But then, since all the other subs are (@), why not simply the following?

blazar++. Just checking if anyone is still paying attention :)

Since bukNew() doesn't need to copy the input list--because it shuffles, and therefore mutates, a list of indices generated internally, rather than mutating a copy of the input--it seemes silly to pass arrays in by value forcing their duplication. Most uses of shuffle() are applied to pre-existing arrays. That's why we make a copy of them, or create a list of aliases to them in most versions--simply to avoid the shuffle from modifying the external arrays. So, instead of

my @array = ...;
...
my @shuffled = shuffle @array;
[download]

we can avoid some copying by using

my @array = ...
...
my @shuffled = shuffle \@array;
[download]

And when we want to shuffle a list, instead of

my @shuffled = shuffle ... some expression generating a list ...;
[download]

we use

my @shuffled = shuffle [ ... some expression generating a list ... ];
[download]

Inspired by blokhead's version, I went back to basics and tried benchmarking a simple version that took a reference rather thn a list:bukNew() with quite surprising results.

Of course, the same lesson can now be retrofitted to those other routines that don't need to replicate the input list for their operation. In particular, blokhead's version!

And here are the headline results of doing that (as blokhead_ref):

Notice that the length of the strings is not a factor for blokhead(_ref) or bukNew!

10 strings length 10
                Rate     naive listutil blokhead_ref       buk blokhea
+d   bukNew
naive        20559/s        --     -44%         -69%      -70%     -70
+%     -74%
listutil     36857/s       79%       --         -44%      -47%     -47
+%     -53%
blokhead_ref 66375/s      223%      80%           --       -4%      -5
+%     -16%
buk          69370/s      237%      88%           5%        --      -0
+%     -12%
blokhead     69635/s      239%      89%           5%        0%       -
+-     -12%
bukNew       79050/s      285%     114%          19%       14%      14
+%       --

10 strings length 1000
                Rate     naive listutil blokhead_ref blokhead       bu
+k   bukNew
naive         1266/s        --     -60%         -98%     -98%      -98
+%     -98%
listutil      3164/s      150%       --         -95%     -95%      -95
+%     -96%
blokhead_ref 67220/s     5209%    2024%           --      -1%       -3
+%     -16%
blokhead     67878/s     5261%    2045%           1%       --       -2
+%     -15%
buk          69084/s     5356%    2083%           3%       2%        -
+-     -14%
bukNew       80288/s     6241%    2437%          19%      18%       16
+%       --

100 strings length 1000
               Rate     naive listutil       buk blokhead blokhead_ref
+    bukNew
naive        93.7/s        --     -58%      -99%     -99%         -99%
+      -99%
listutil      225/s      141%       --      -97%     -97%         -97%
+      -98%
buk          7540/s     7946%    3244%        --      -5%          -7%
+      -23%
blokhead     7919/s     8349%    3412%        5%       --          -2%
+      -19%
blokhead_ref 8105/s     8547%    3494%        7%       2%           --
+      -18%
bukNew       9831/s    10390%    4260%       30%      24%          21%
+        --


1000 strings length 1000
               Rate     naive listutil       buk blokhead blokhead_ref
+    bukNew
naive        8.16/s        --     -63%      -99%     -99%         -99%
+      -99%
listutil     22.1/s      170%       --      -97%     -97%         -97%
+      -97%
buk           737/s     8932%    3240%        --      -8%         -10%
+      -11%
blokhead      804/s     9750%    3542%        9%       --          -2%
+       -3%
blokhead_ref  819/s     9927%    3608%       11%       2%           --
+       -1%
bukNew        826/s    10023%    3643%       12%       3%           1%
+        --
[download]

The significant thing is that blokhead_ref moves ahead of blokhead very quickly as the number of elements increases. Avoiding copying the array pays dividends very quickly.

However some random tests with various string and list lengths seem to show that it could be considerably slower than buk(), by even as much as 40% or so. Thus... are you sure your benchmark is not flawed?

If you look back over the thread, you'll see that there have been various instances where different people have seen different results from benchmarking apparently the same code.

Some of the differences are explained by whether the data being shuffled consists of all integer data (as in your original benchmark (0..1000 etc.), or string data as used by most people after ikegami noted the difference it makes. When an SV points to an IV, copying that SV is a faster operation than when it points to a PV. With the latter, a second memory allocation and memcpy operation have to be performed to copy the string data pointed at by the PV. If the SV contains integer (and probably float?), and has never been used in a string context, then the number will never have been ascii-ized and the PV will be null. That makes for significantly less work to copy it.

That doesn't explain all the anomolies seen above though I think? Anyway, it's quite possible my benchmark is flawed--it certainly wouldn't be the first time :)--so here is the code. Tell me what you think?

The benchmark code:


Syntactic Confectionery Delight
	PerlMonks