Re: The High Price of Golf, and A Surprise

in reply to The High Price of Golf, and A Surprise

You should include cmp in your benchmark. I think you'll find it slower than the built-in sort. That speed difference is the overhead of the callback sub.

I expect that cmp in general is a far slower operation than <=>. The latter only takes one CPU instruction (plus the overhead of the interpreter), the former is a slow library call for many strings — especially if the strings you compare are identical, because now all characters in the strings have to be examined.

And I think what you found is that the callback overhead for sort is still faster than the speed difference between these two ops.

Enough blah blah. I've added them to your benchmark code, so it now looks like:

cmpthese(
    $count,
    {
      f_owtdi   => sub { @rfl = sort { $a - $b }   @nfl },
      i_owtdi   => sub { @rin = sort { $a - $b }   @nin },
      f_sship   => sub { @rfl = sort { $a <=> $b } @nfl },
      i_sship   => sub { @rin = sort { $a <=> $b } @nin },
      f_alpha   => sub { @sfl = sort               @afl },
      i_alpha   => sub { @sin = sort               @ain },
      f_cmp     => sub { @sfl = sort { $a cmp $b } @afl },
      i_cmp     => sub { @sin = sort { $a cmp $b } @ain },
    });
[download]

I'll update with the results in a moment. This benchmark takes many minutes to run, and I don't want to skew the results by doing heavy stuff with my humble PC.

Update: I'm back. Here are the results:

Benchmark: timing 300 iterations of f_alpha, f_cmp, f_owtdi, f_sship, 
+i_alpha, i_cmp, i_owtdi, i_sship...
   f_alpha: 26 wallclock secs (25.92 usr +  0.00 sys = 25.92 CPU) @ 11
+.57/s (n=300)
     f_cmp: 26 wallclock secs (25.87 usr +  0.00 sys = 25.87 CPU) @ 11
+.60/s (n=300)
   f_owtdi: 37 wallclock secs (36.58 usr +  0.00 sys = 36.58 CPU) @  8
+.20/s (n=300)
   f_sship: 17 wallclock secs (16.92 usr +  0.00 sys = 16.92 CPU) @ 17
+.73/s (n=300)
   i_alpha: 21 wallclock secs (20.76 usr +  0.00 sys = 20.76 CPU) @ 14
+.45/s (n=300)
     i_cmp: 20 wallclock secs (20.82 usr +  0.00 sys = 20.82 CPU) @ 14
+.41/s (n=300)
   i_owtdi: 35 wallclock secs (35.53 usr +  0.00 sys = 35.53 CPU) @  8
+.44/s (n=300)
   i_sship: 16 wallclock secs (15.87 usr +  0.00 sys = 15.87 CPU) @ 18
+.90/s (n=300)
          Rate f_owtdi i_owtdi f_alpha   f_cmp   i_cmp i_alpha f_sship
+ i_sship
f_owtdi 8.20/s      --     -3%    -29%    -29%    -43%    -43%    -54%
+    -57%
i_owtdi 8.44/s      3%      --    -27%    -27%    -41%    -42%    -52%
+    -55%
f_alpha 11.6/s     41%     37%      --     -0%    -20%    -20%    -35%
+    -39%
f_cmp   11.6/s     41%     37%      0%      --    -20%    -20%    -35%
+    -39%
i_cmp   14.4/s     76%     71%     24%     24%      --     -0%    -19%
+    -24%
i_alpha 14.5/s     76%     71%     25%     25%      0%      --    -18%
+    -24%
f_sship 17.7/s    116%    110%     53%     53%     23%     23%      --
+     -6%
i_sship 18.9/s    130%    124%     63%     63%     31%     31%      7%
+      --
[download]

That's odd. there is NO speed difference between [fi]_cmp and [fi]_alpha. That means that the cost of the callback is negligable... unless Perl is optimizing the callback away?

Just to make sure, I've also swapped $a and $b in my callback sub. It doesn't make a difference.

Conclusion: the speed gain you get by using numerical sort, is entirely due to the speed difference between the ops <=> and cmp.

In Section Meditations