Re: Problems with custom sort

Replies are listed 'Best First'.
Re^2: Problems with custom sort by Aristotle (Chancellor) on Jun 30, 2002 at 22:35 UTC
That while the the original version calls the `size()` method more often, your transform also generates and dismantles a bunch of anonymous arrays. If the naive form only calls `size()` few times, which it does for an array as small as in your test case, then the transform's overhead evidently outweighs these calls. It is no surprise really. Bubble sort is the fastest way to sort a already nearly sorted array of not more than a few dozen elements (as a rule of thumb). All of these algorithms have to make a trade-off: either you pay a high cost up front and then pay very little for each individual element, or you pay little to nothing up front but buy each element at a high cost. The obvious implication is that when you only have few elements, buying them all at a high cost each totals to a smaller sum than the high up-front cost of a more advanced algorithm. Makeshifts last the longest.	[reply]
Re: Re^2: Problems with custom sort by rob_au (Abbot) on Jul 01, 2002 at 00:00 UTC
I would just add that the behaviour of sort is still dynamic with changes in internal workings listed in perldelta for 5.8.0rc2 - From this document ... `sort()` is also fully reentrant, in the sense that the sort function can itself call sort(). This did not work reliably in previous releases. `sort()` has been changed to use primarily mergesort internally as opposed to the earlier quicksort. For very small lists this may result in slightly slower sorting times, but in general the speedup should be at least 20%. Additional bonuses are that the worst case behaviour of `sort()` is now better (in computer science terms it now runs in time O(N log N), as opposed to quicksort's `Theta(N*2)` worst-case run time behaviour), and that `sort()` is now stable (meaning that elements with identical keys will stay ordered as they were before the sort). See the `sort` pragma for information. The story in more detail: suppose you want to serve yourself a little slice of Pi. @digits = ( 3,1,4,1,5,9 ); A numerical sort of the digits will yield (1,1,3,4,5,9), as expected. Which `1` comes first is hard to know, since one `1` looks pretty much like any other. You can regard this as totally trivial, or somewhat profound. However, if you just want to sort the even digits ahead of the odd ones, then what will sort { ($a % 2) <=> ($b % 2) } @digits; yield? The only even digit, `4`, will come first. But how about the odd numbers, which all compare equal? With the quicksort algorithm used to implement Perl 5.6 and earlier, the order of ties is left up to the sort. So, as you add more and more digits of Pi, the order in which the sorted even and odd digits appear will change. and, for sufficiently large slices of Pi, the quicksort algorithm in Perl 5.8 won't return the same results even if reinvoked with the same input. The justification for this rests with quicksort's worst case behavior. If you run sort { $a <=> $b } ( 1 .. $N , 1 .. $N ); (something you might approximate if you wanted to merge two sorted arrays using sort), doubling $N doesn't just double the quicksort time, it quadruples* it. Quicksort has a worst case run time that can grow like N*2, so-called quadratic* behaviour, and it can happen on patterns that may well arise in normal use. You won't notice this for small arrays, but you will notice it with larger arrays, and you may not live long enough for the sort to complete on arrays of a million elements. So the 5.8 quicksort scrambles large arrays before sorting them, as a statistical defence against quadratic behaviour. But that means if you sort the same large array twice, ties may be broken in different ways. Because of the unpredictability of tie-breaking order, and the quadratic worst-case behaviour, quicksort was almost replaced completely with a stable mergesort. Stable means that ties are broken to preserve the original order of appearance in the input array. So sort { ($a % 2) <=> ($b % 2) } (3,1,4,1,5,9); will yield (4,3,1,1,5,9), guaranteed. The even and odd numbers appear in the output in the same order they appeared in the input. Mergesort has worst case O(N log N) behaviour, the best value attainable. And, ironically, this mergesort does particularly well where quicksort goes quadratic: mergesort sorts (1..$N, 1..$N) in `O(N)` time. But quicksort was rescued at the last moment because it is faster than mergesort on certain inputs and platforms. For example, if you really don't care about the order of even and odd digits, quicksort will run in `O(N)` time; it's very good at sorting many repetitions of a small number of distinct elements. The quicksort divide and conquer strategy works well on platforms with relatively small, very fast, caches. Eventually, the problem gets whittled down to one that fits in the cache, from which point it benefits from the increased memory speed. Quicksort was rescued by implementing a sort pragma to control aspects of the sort. The stable subpragma forces stable behaviour, regardless of algorithm. The _quicksort and _mergesort subpragmas are heavy-handed ways to select the underlying implementation. The leading `_` is a reminder that these subpragmas may not survive beyond 5.8. More appropriate mechanisms for selecting the implementation exist, but they wouldn't have arrived in time to save quicksort.	[reply] [d/l] [select]


Clear questions and runnable code get the best and fastest answer
	PerlMonks