Re^2: performance issues with grepping large sorted arrays

Replies are listed 'Best First'.
Re^3: performance issues with grepping large sorted arrays by almut (Canon) on Jul 24, 2007 at 00:16 UTC
I absolutely agree that a binary search would be faster if you have a sorted array (though sorting the array in the first place would also take quite some time, and I was under the impression that the OP would only be doing that in an attempt to speed up things). As to "the worst case is still O(n) and the average case is O(n/2) which simplifies to O(n)": I think in most real life situations, you're very unlikely to encounter the worst case scenario (in the OP's case, it would mean that every $key being looked up would match the last element of the array, or is not found at all). Sure, ultimately it's a statistical issue, but in the typical case there will be some improvement when short-circuiting the search. But that's a moot point anyway, as there are even better alternatives... (like hash tables, with constant-time lookup on average)	[reply]
Re^4: performance issues with grepping large sorted arrays by gam3 (Curate) on Jul 24, 2007 at 14:25 UTC
The point is: grep is O(n) the short circuit for loop is O(n/2) and O(n) == O(n/2). -- gam3 A picture is worth a thousand words, but takes 200K.	[reply]
Re^5: performance issues with grepping large sorted arrays by mr_mischief (Monsignor) on Jul 24, 2007 at 16:43 UTC
I'd like to ote that gam3 is noting average case behavior for short-circuiting the linear search here, not worst-case. O(n/2) is as good as you can expect short-circuited linear search to get over time. Absolute best case is, of course, O(1) if you happen to find what you want at the beginning of the list. Worst case is O(n). All this simplifies to terms of O(n). Binary search is much more efficient (save for very small lists where the setup overhead is large). It is O(log2(n)). For 800k elements, you're talking about linear search being 800k comparisons, shorted linear 400k (worst case 800k), and binary search being about 20 comparisons. There's not much to argue here. In this particular thread, the sort then 1000 searches is still going to be blown away by inverting the loop and not doing any searches at all. Thus is the power of the hash lookup.	[reply]
Re^5: performance issues with grepping large sorted arrays by almut (Canon) on Jul 24, 2007 at 17:35 UTC
The point is: ... If you think that this is The point, could you elaborate on its practical relevance with respect to the statement I originally (implicitly) made, which is just that skipping unnecessary iterations/comparisons is faster (in the typical case) than doing full iterations every time (like `grep`). This holds even though the upper bound expressed in both algorithms' Big O complexity classification is the same. I never made any statement as to the method being O(N) or not. Also, no doubt that a binary search on a sorted list would be faster...	[reply] [d/l]
Re^6: performance issues with grepping large sorted arrays by mr_mischief (Monsignor) on Jul 24, 2007 at 20:13 UTC


No such thing as a small change
	PerlMonks