Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Heap sorting in perl

by adrianh (Chancellor)
on Apr 05, 2003 at 14:49 UTC ( [id://248296]=note: print w/replies, xml ) Need Help??


in reply to Heap sorting in perl

No comments on Heap not having ever used it.

However, unless you are dynamically adding and removing items to your dataset wouldn't something like:

sub smallest_n (&\@$) { my ($cmp, $arrayref, $n) = @_; return unless $n && @$arrayref; $n = @$arrayref if @$arrayref < $n; my @results = sort $cmp @$arrayref[0..$n-1]; local ($a, $b); $a = pop @results; for (my $i = $n; $i < @$arrayref; $i++) { $b = $arrayref->[$i]; if ($cmp->() == 1) { @results = sort $cmp (@results, $b); $a = pop @results; }; }; return (@results, $a); }; use Test::More tests => 1; use List::Util qw(shuffle); my @a = shuffle (1..100000); my @b = smallest_n {$a <=> $b} @a, 5; is_deeply [@b], [1,2,3,4,5];

be easier? The support for the heap datastructure will add a fair bit of overhead with your large dataset, so I wouldn't use one unless you need it.

Update: totally misready what blakem was proposing. D'oh.

Replies are listed 'Best First'.
Re: Re: Heap sorting in perl
by Anonymous Monk on Apr 05, 2003 at 15:26 UTC
    The reason that someone with a large dataset might want a heap is that they don't want to have very much of the dataset in memory at once. With a Heap you can add an element at a time until you have your limit, and from there on you can add one and remove the biggest each time.

    When you are done you can then just extract off all of the elements in the heap, and you have the smallest N of them from largest to smallest. Without excessive memory usage.

      Erm. No :-)

      You have to add all the items from the dataset to the heap before you can remove the N lowest (or highest, depending on the direction your grow your tree).

      Yes, you could have the heap as an out-of-memory structure. However, if the dataset is on disk you can just read it in element by element and use the algorithm I proposed. It's still going to be less expensive in time and space than creating a heap.

      Unless you are goin to be adding and removing entries from the data set and need to keep it ordered a heap is overkill for the problem as stated.

        I don't follow your reasoning.

        If you know when you start how many you will want in the end, what would possibly prevent you from adding and removing from your heap before you had all of the data in? Sure, the first "maximum" removed and thrown away might not be the biggest element in the whole set. But it wasn't in the smallest N, and that is all we care about. Who cares what order we throw away the rest in?? (As long as we throw it away before running out of memory!)

        As for overkill, using Perl to solve a problem that can be solved with the correct option to the Unix sort utility is overkill. But if you need a general purpose solution in Perl and you have a heap implementation there already, why not use it?

Re: Heap sorting in perl
by Abigail-II (Bishop) on Apr 06, 2003 at 21:04 UTC
    Your algorithm runs in Omega (N k log k), where N is the size of the set, and you are interested in the k smallest elements. Which is pretty lousy. With k for instance N / 100, your algorithm is worse than doing a bubble sort of the entire set, and getting the k smallest from the sorted set.

    Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://248296]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2024-04-24 07:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found