Re: Heap sorting in perl

No comments on Heap not having ever used it.

However, unless you are dynamically adding and removing items to your dataset wouldn't something like:

sub smallest_n (&\@$) {
    my ($cmp, $arrayref, $n) = @_;
    return unless $n && @$arrayref;
    $n = @$arrayref if @$arrayref < $n;
    my @results = sort $cmp @$arrayref[0..$n-1];
    local ($a, $b);
    $a = pop @results;
    for (my $i = $n; $i < @$arrayref; $i++) {
        $b = $arrayref->[$i];
        if ($cmp->() == 1) {
            @results = sort $cmp (@results, $b);
            $a = pop @results;
        };
    };
    return (@results, $a);
};

use Test::More tests => 1;
use List::Util qw(shuffle);
my @a = shuffle (1..100000);
my @b = smallest_n {$a <=> $b} @a, 5;
is_deeply [@b], [1,2,3,4,5];
[download]

be easier? ~~The support for the heap datastructure will add a fair bit of overhead with your large dataset, so I wouldn't use one unless you need it.~~

Update: totally misready what blakem was proposing. D'oh.

Comment on Re: Heap sorting in perl Download Code

Replies are listed 'Best First'.
Re: Re: Heap sorting in perl by Anonymous Monk on Apr 05, 2003 at 15:26 UTC
The reason that someone with a large dataset might want a heap is that they don't want to have very much of the dataset in memory at once. With a Heap you can add an element at a time until you have your limit, and from there on you can add one and remove the biggest each time. When you are done you can then just extract off all of the elements in the heap, and you have the smallest N of them from largest to smallest. Without excessive memory usage.	[reply]
Re^3: Heap sorting in perl by adrianh (Chancellor) on Apr 05, 2003 at 15:40 UTC
Erm. No :-) You have to add all the items from the dataset to the heap before you can remove the N lowest (or highest, depending on the direction your grow your tree). Yes, you could have the heap as an out-of-memory structure. However, if the dataset is on disk you can just read it in element by element and use the algorithm I proposed. It's still going to be less expensive in time and space than creating a heap. Unless you are goin to be adding and removing entries from the data set and need to keep it ordered a heap is overkill for the problem as stated.	[reply]
Re: Re^3: Heap sorting in perl by Anonymous Monk on Apr 05, 2003 at 15:51 UTC
I don't follow your reasoning. If you know when you start how many you will want in the end, what would possibly prevent you from adding and removing from your heap before you had all of the data in? Sure, the first "maximum" removed and thrown away might not be the biggest element in the whole set. But it wasn't in the smallest N, and that is all we care about. Who cares what order we throw away the rest in?? (As long as we throw it away before running out of memory!) As for overkill, using Perl to solve a problem that can be solved with the correct option to the Unix sort utility is overkill. But if you need a general purpose solution in Perl and you have a heap implementation there already, why not use it?	[reply]
Re^5: Heap sorting in perl by adrianh (Chancellor) on Apr 05, 2003 at 16:29 UTC
Re: Re^5: Heap sorting in perl by Anonymous Monk on Apr 05, 2003 at 17:02 UTC
Some notes below your chosen depth have not been shown here
Re: Heap sorting in perl by Abigail-II (Bishop) on Apr 06, 2003 at 21:04 UTC
Your algorithm runs in Omega (N k log k), where N is the size of the set, and you are interested in the k smallest elements. Which is pretty lousy. With k for instance N / 100, your algorithm is worse than doing a bubble sort of the entire set, and getting the k smallest from the sorted set. Abigail	[reply]


Your skill will accomplish what the force of many cannot
	PerlMonks