Re^5: Heap sorting in perl

I don't follow your reasoning.

We were talking at cross purposes. I thought that you were proposing to heap sort the entire dataset, not the subset of smallest elements.

I admit that my using sort to order the smallest subset is a hack, but unless N is large or the comparison function is expensive, it will be faster than a pure-perl heap.

As for overkill, using Perl to solve a problem that can be solved with the correct option to the Unix sort utility is overkill. But if you need a general purpose solution in Perl and you have a heap implementation there already, why not use it?

I don't see how the Unix sort applies. We don't want to sort the whole dataset - we want the smallest N elements (and we also don't know if the dataset consists of line oriented data.)

Comment on Re^5: Heap sorting in perl Download Code

Replies are listed 'Best First'.
Re: Re^5: Heap sorting in perl by Anonymous Monk on Apr 05, 2003 at 17:02 UTC
I was proposing to solve the problem originally stated. I thought that my description of how to solve it using a heap was fairly clear on that point by telling you when to start removing elements while still adding them. I agree that until N is large (causing, eg, memory problems) or the comparison function is expensive, using sort will be faster. The heap solution outlined is helpful only for large N. For an expensive comparison function, it is helpful once you start pulling off elements to keep track of the largest that you have in your set. Anything which is larger than or equal to that one should not be added. This reduces the number of comparisons significantly. As for the sort utility, I had thought that it had an option to only return the largest N elements. But looking at it more carefully I don't see it. It must have been somewhere else that I saw that. My bad. :-(	[reply]
Re^7: Heap sorting in perl by adrianh (Chancellor) on Apr 05, 2003 at 20:12 UTC
I was proposing to solve the problem originally stated. I thought that my description of how to solve it using a heap was fairly clear on that point by telling you when to start removing elements while still adding them. I know. Just misinterpreted it. Sorry. One of my many moments of stupidity ;-) I agree that until N is large (causing, eg, memory problems) or the comparison function is expensive, using sort will be faster. The heap solution outlined is helpful only for large N. A good XS heap implementation would still win over sort with small N. It's just a C vs Perl issue speedwise.	[reply]


Don't ask to ask, just ask
	PerlMonks