Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Problems with custom sort

by mephit (Scribe)
on Jun 30, 2002 at 21:16 UTC ( [id://178418]=note: print w/replies, xml ) Need Help??


in reply to Problems with custom sort

Hmm, first thing I thought of was a Schwartzian transform:
my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, $_->size()]} @$setarray;
Hmm, I just did a quick benchmark using an array of just three objects, and this method is slower than the original method. I'd have thought the original method would be slower due to many more method calls in the sort subroutine, vs. only three in this method. What's wrong with my thought process here?

--

There are 10 kinds of people -- those that understand binary, and those that don't.

Replies are listed 'Best First'.
Re^2: Problems with custom sort
by Aristotle (Chancellor) on Jun 30, 2002 at 22:35 UTC

    That while the the original version calls the size() method more often, your transform also generates and dismantles a bunch of anonymous arrays. If the naive form only calls size() few times, which it does for an array as small as in your test case, then the transform's overhead evidently outweighs these calls. It is no surprise really.

    Bubble sort is the fastest way to sort a already nearly sorted array of not more than a few dozen elements (as a rule of thumb).

    All of these algorithms have to make a trade-off: either you pay a high cost up front and then pay very little for each individual element, or you pay little to nothing up front but buy each element at a high cost. The obvious implication is that when you only have few elements, buying them all at a high cost each totals to a smaller sum than the high up-front cost of a more advanced algorithm.

    Makeshifts last the longest.

      I would just add that the behaviour of sort is still dynamic with changes in internal workings listed in perldelta for 5.8.0rc2 - From this document ...

      • sort() is also fully reentrant, in the sense that the sort function can itself call sort(). This did not work reliably in previous releases.
      • sort() has been changed to use primarily mergesort internally as opposed to the earlier quicksort. For very small lists this may result in slightly slower sorting times, but in general the speedup should be at least 20%. Additional bonuses are that the worst case behaviour of sort() is now better (in computer science terms it now runs in time O(N log N), as opposed to quicksort's Theta(N**2) worst-case run time behaviour), and that sort() is now stable (meaning that elements with identical keys will stay ordered as they were before the sort). See the sort pragma for information.

        The story in more detail: suppose you want to serve yourself a little slice of Pi.

            @digits = ( 3,1,4,1,5,9 );

        A numerical sort of the digits will yield (1,1,3,4,5,9), as expected. Which 1 comes first is hard to know, since one 1 looks pretty much like any other. You can regard this as totally trivial, or somewhat profound. However, if you just want to sort the even digits ahead of the odd ones, then what will

            sort { ($a % 2) <=> ($b % 2) } @digits;

        yield? The only even digit, 4, will come first. But how about the odd numbers, which all compare equal? With the quicksort algorithm used to implement Perl 5.6 and earlier, the order of ties is left up to the sort. So, as you add more and more digits of Pi, the order in which the sorted even and odd digits appear will change. and, for sufficiently large slices of Pi, the quicksort algorithm in Perl 5.8 won't return the same results even if reinvoked with the same input. The justification for this rests with quicksort's worst case behavior. If you run

           sort { $a <=> $b } ( 1 .. $N , 1 .. $N );

        (something you might approximate if you wanted to merge two sorted arrays using sort), doubling $N doesn't just double the quicksort time, it quadruples it. Quicksort has a worst case run time that can grow like N**2, so-called quadratic behaviour, and it can happen on patterns that may well arise in normal use. You won't notice this for small arrays, but you will notice it with larger arrays, and you may not live long enough for the sort to complete on arrays of a million elements. So the 5.8 quicksort scrambles large arrays before sorting them, as a statistical defence against quadratic behaviour. But that means if you sort the same large array twice, ties may be broken in different ways.

        Because of the unpredictability of tie-breaking order, and the quadratic worst-case behaviour, quicksort was almost replaced completely with a stable mergesort. Stable means that ties are broken to preserve the original order of appearance in the input array. So

            sort { ($a % 2) <=> ($b % 2) } (3,1,4,1,5,9);

        will yield (4,3,1,1,5,9), guaranteed. The even and odd numbers appear in the output in the same order they appeared in the input. Mergesort has worst case O(N log N) behaviour, the best value attainable. And, ironically, this mergesort does particularly well where quicksort goes quadratic: mergesort sorts (1..$N, 1..$N) in O(N) time. But quicksort was rescued at the last moment because it is faster than mergesort on certain inputs and platforms. For example, if you really don't care about the order of even and odd digits, quicksort will run in O(N) time; it's very good at sorting many repetitions of a small number of distinct elements. The quicksort divide and conquer strategy works well on platforms with relatively small, very fast, caches. Eventually, the problem gets whittled down to one that fits in the cache, from which point it benefits from the increased memory speed.

        Quicksort was rescued by implementing a sort pragma to control aspects of the sort. The stable subpragma forces stable behaviour, regardless of algorithm. The _quicksort and _mergesort subpragmas are heavy-handed ways to select the underlying implementation. The leading _ is a reminder that these subpragmas may not survive beyond 5.8. More appropriate mechanisms for selecting the implementation exist, but they wouldn't have arrived in time to save quicksort.

       

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://178418]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-20 01:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found