I would have been interested to see timing comparisons of the two different built-in sorts that are available, quick and merge, versus the random and pathological data sets you're using. See the sort pragma.
Now I have done it. First, this is the code (for the quick sort case):
use warnings;
use sort '_qsort';
use Benchmark qw(timethese);
my @x;
my $max = 40000;
$x[$_] = int rand(2e5) foreach (0..$max);
my @w = 1..$max;
my @z = reverse 1..$max;
sub sort_random {
my @v = sort {$a <=> $b} (@x);
}
sub sort_sorted {
my @v = sort {$a <=> $b} (@w);
}
sub sort_reversed {
my @v = sort {$a <=> $b} (@z);
}
timethese (800, {
'random' => \&sort_random,
'sorted' => \&sort_sorted,
'reverse' => \&sort_reversed,
} );
Of course for benchmarking significantly the much more efficient built-in sort functions, I used a larger array and more iterations. So don't compare the benchmark data produced here with the previous ones. OK, the result with the quick sort activated:
$ perl bench_sort2.pl
Benchmark: timing 800 iterations of random, reverse, sorted...
random: 14 wallclock secs (14.48 usr + 0.00 sys = 14.48 CPU) @ 55
+.26/s (n=800)
reverse: 14 wallclock secs (13.57 usr + 0.00 sys = 13.57 CPU) @ 58
+.94/s (n=800)
sorted: 14 wallclock secs (13.57 usr + 0.00 sys = 13.57 CPU) @ 58
+.94/s (n=800)
So no significant difference between the three cases. I think that the quick sort implementation now randomizes the order of the input data before actually starting the sort, in order to evacuate most pathological behaviors, so that there are no significant differences between sorted and unsorted arrays. I am almost sure I've read that somewhere, but I can't remember where. I wish I had somewhere a Perl 5.6 implementation running, we could probably see the difference, since I think this randomization of the input order was probably introduced with Perl 5.8.
Now the same test with merge sort (commenting out the "use sort '_qsort';" line, no other change to the code).
$ perl bench_sort2.pl
Benchmark: timing 800 iterations of random, reverse, sorted...
random: 11 wallclock secs (11.31 usr + 0.00 sys = 11.31 CPU) @ 70
+.73/s (n=800)
reverse: 2 wallclock secs ( 1.98 usr + 0.00 sys = 1.98 CPU) @ 40
+3.84/s (n=800)
sorted: 2 wallclock secs ( 1.94 usr + 0.00 sys = 1.94 CPU) @ 41
+3.44/s (n=800)
With sorted and reversed sorted data, merge sort is doing much better than quick sort in general and than merge sort on random data. Alright, I will not try to make any further conclusion on these tests, I do not feel like going into the Perl interpreter C code to check what is really going on.
Update Apr 25, 2014, 09:45 UTC: I knew I had read it somewhere. The official documentation on the sort pragma (sort) states: "In Perl 5.8 and later, quicksort defends against quadratic behaviour by shuffling large arrays before sorting."
|