Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

It's all a matter of perspective. In the OP's query, he had 18 items in his list. Taking your code, changing the $search_for to be $limit/2 to better approximate actual random values, and using an input of 17 (since you seem to add one), I get:

$ perl ./x.pl 17 Done generating list 9 9 Rate brutal binary brutal 449367/s -- -17% binary 544651/s 21% --
So the difference is ~20%. But what are we really talking about here? On my system, that difference is really dropping from a binary search's 2.23 µs to 1.84 µs. A savings of 0.39 µs. Seriously, we're quibbling here? (Binary search should do REALLY well here, too, picking nearly the first item, I think)

In your cases, I see at 100 (101, really, but again, let's not quibble about small details) dropping from 5.4 µs to 4.5 µs, a savings of 0.88 µseconds. At 1000, it's 48.6 µs vs 6.41 µs, or 42.2 µs savings. Much bigger numbers. But, seriously, is it a concern?

At 10,000, it's 461 µs vs 7.75 µs, or a savings of 454 µs. We're still under a thousandth of a second here, folks. Even at 100,000 items in the (pre-)sorted list, we're comparing 5.92 ms vs 10.9 µs. Sure, that's a savings of nearly the whole thing, but that's still only saving 5.91 ms. Really, do we care?

Now, granted, you ran these tests skewed (looking for something 33% in instead of half-way, otherwise binary search wouldn't even have to do anything), but the reality is (and I think starbolin's point is) that doing this search really doesn't take much time. Worrying about this before you've profiled anything is simply premature optimisation, and you'd get bigger bang for your buck (that being programmer's time) if you'd spend it on something else. The reward for all the time spent coding, testing, and fixing a binary search just does not pay for itself, at least not if you're merely running it once, for a small number (i.e., less than a million, it seems).

Changing $search_for to my $search_for = $a[int($limit * (1/2 - 1/9))];, and rerunning at 1_000_000 entries shows a savings of about 66.7ms on my machine. It's simply not worth it. Running one more time for 10 million, and the savings is 775 ms (from 775ms minus 14.1 µs). Now it's starting to be worth it. If it's being run interactively. If it's a long-running program that users expect to walk away from and come back, then even that's no big deal.

Of course, this may all be moot if it's code for perlmonks or slashdot or some other active site, but you still should profile before worrying about such small improvements. Chances are, your bottlenecks still are elsewhere.

Update: I should point out that I realise that bunchmarking a static index for an O(n) vs O(lg n) comparison is inherently unfair, and an equal distribution across the entire problem space would need to be concocted for truely accurate numbers. However, even if that resulted in numbers two or three times what I provide above (which would not be the case since the upper limit is on the brute-force method, and that would not increase by much), that would not change the conclusion. 18ms is not much more time than 6ms (and the true number would probably be still under 8ms at 100,000 on this machine).


In reply to Re^3: find closest element of array without going over by Tanktalus
in thread find closest element of array without going over by sadfaceman

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 07:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found