Re: A heap of medians: efficiency vs. speed

As author of Heap::Simple I'd like to add the following observations:

Make sure Heap::Simple::XS is used if you care about speed for this kind of operation.
Use the (default) scalar element type if that is good enough. It's the fastest of the available element types.
Use the max_count option if you only want an already known last m elements. This will save a lot of compares. In this case it also saves the extract loop.

With these three getting the median of a heap of 1e6 integers is close in speed. Sort is still faster though.

You can also add the dirty option. Heap::Simple::XS will then use direct numeric compares instead of calling perl's >. This breaks overload and ties though. That way the Heap::Simple solution is about twice as fast as sort. The break-even is somewhere in the ten thousands of elements.

The resulting code looks something like this:

use List::Util qw(shuffle);
use POSIX qw(ceil);
use Heap::Simple::XS;

my @list = shuffle(0..1e6); 
my $h = Heap::Simple::XS->new(max_count => ceil(@list/2),
                              dirty => 1);
$h->insert(@list);
print $h->extract_top;
[download]

This doesn't change the fact that for even quite big lists sort is just faster for this kind of problem. Heaps are mainly for when you need to know the smallest element at many points in time while the set of elements keeps changing. Or indeed for some cases where compare is very expensive, but even then there are better ways to do it with less compares if you just want the median.

PS: Heap::Simple already uses special code to make an insert of many elements at once faster than repeatedly inserting one element. You seem to hint at an even faster algorithm but I don't see how to use Tarjan's algorithm for that. Can you give me a reference ?

Comment on Re: A heap of medians: efficiency vs. speed Select or Download Code

Replies are listed 'Best First'.
Re^2: A heap of medians: efficiency vs. speed by kyle (Abbot) on Jun 12, 2008 at 20:30 UTC
Thanks for your comments! I'm sorry to say I don't know any more about building heaps faster than the little bit I said in the article. I got what I wrote about Tarjan's algorithm from the Wikipedia article, Heap (data structure), but I haven't looked into it myself at all. Sorry I can't be more help.	[reply]


Syntactic Confectionery Delight
	PerlMonks