I'd like to ote that gam3 is noting average case behavior for short-circuiting the linear search here, not worst-case.
O(n/2) is as good as you can expect short-circuited linear search to get over time. Absolute best case is, of course, O(1) if you happen to find what you want at the beginning of the list. Worst case is O(n). All this simplifies to terms of O(n).
Binary search is much more efficient (save for very small lists where the setup overhead is large). It is O(log2(n)). For 800k elements, you're talking about linear search being 800k comparisons, shorted linear 400k (worst case 800k), and binary search being about 20 comparisons. There's not much to argue here.
In this particular thread, the sort then 1000 searches is still going to be blown away by inverting the loop and not doing any searches at all. Thus is the power of the hash lookup. | [reply] |
The point is: ...
If you think that this is The point, could you elaborate
on its practical relevance with respect to the statement I originally
(implicitly) made, which is just that skipping unnecessary
iterations/comparisons is faster (in the typical case) than doing full
iterations every time (like grep). This holds even though the upper bound expressed
in both algorithms' Big O complexity classification is the same.
I never made any statement as to the method being O(N) or not. Also,
no doubt that a binary search on a sorted list would be faster...
| [reply] [d/l] |
Indeed, if there's no other reason to have the array sorted than to do a binary search instead of a short-circuited linear search, there may not be much overall difference between the n/2 vs. the n * log2(n) + log2(n).
In this case, we're talking about m*n, so O(m*n/2) vs. O( n * log2(n) + m * log2(n) ). (1000 * 400k) for short-circuit linear with no sort or (800k * 20 + 1000 * 20) for sort and binary search. 400,000,000 vs. 16,020,000 typical. Although it could be worst case 800,000,000 vs 640,000,020,000.
So between the two, there's better typical time on the quicksort and binary search, but more time stability on the short-circuited linear search with no sort. Pathological cases would hopefully be rare.
All of which is still, in this case, much ado about nothing. There should be 800,000 hash lookups for comparison and no searching. The method with 0.002 as many comparisons as one or 0.05 as many comparisons as the other is the clear winner.
| [reply] |