Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Reinventing wheels based on bad benchmarks

by runrig (Abbot)
on Jan 09, 2003 at 21:50 UTC ( #225657=note: print w/replies, xml ) Need Help??

in reply to RE: WARNING t0mas wrote BAD CODE
in thread Odd file rename

my @Files = grep { /\.txt$/ && -f "$Dir$dirSep$_" } ... ... -f && /\.txt$/ && ($fileCounter+=1);
You've made a big benchmarking error. You are comparing two very different, though seemingly equivalent, operations. In your first test, you first do a relatively inexpensive operation first which in most cases short-circuits the need to do the relatively expensive operation second (assuming there are relatively few '.txt' files).

In the second test, you always perform the expensive operation (a file stat, or whatever the Windows equivalent is), which makes this an unfair comparison. Make the comparison fair and I think you'll be surprised.

BTW, it would take more than the 4-6% improvement that you cite for me to reinvent this wheel anyway :-)

Replies are listed 'Best First'.
Re: Reinventing wheels based on bad benchmarks
by t0mas (Priest) on Jan 10, 2003 at 08:23 UTC
    I'm not surprised.

    If you read my post carefully you'll note that I wrote On Linux it depended on when the regexp was evaluated. If I put it before -f, it performed better than if I put it after..

    I've experimented a lot with this issue before making the post, and I've experimented a lot before making my original post which caused so much debate.

    The file stat has no _significant_ effect on the benchmark. No major impact as you say.

    You can try it yourself!

    The impact it have is on the first run only, since the second run is read from some file cache and become very in-expensive.

    I re-ran the benchmark today (my current box is a Pentium 1000, Windows 2000 with perl v5.6.1 built for MSWin32-x86-multi-thread) and included a find sub with no -f at all (the test3), hitting 1250 files:

    test1: 42 wallclock secs ( 9.21 usr + 31.04 sys = 40.26 CPU)
    test2: 52 wallclock secs (13.48 usr + 34.29 sys = 47.77 CPU)
    test3: 51 wallclock secs (13.14 usr + 34.33 sys = 47.47 CPU)


    I think the decicion to reinvent or not (in this case), depends on wether 4-6% is important or not. If 4-6% speed gain makes your program meet the specifications and fail otherwise, what then?.

    /brother t0mas
      If 4-6% speed gain makes your program meet the specifications and fail otherwise, what then?

      It's a tired argument/rebuttal, but if reinventing this wheel puts you inside the specifications, then chances are the time you spent reinventing would have been far better invested in some other part of the code that will likely gain you more than a mere 5% average improvement.

      Remember that crawling directories is a heavily I/O bound activity where optimizations in your code are unlikely to be able to make a great deal of difference.

      However, as a suggestion (I haven't benched it), try this:

      sub test4 { find({ preprocess => sub { $fileCounter += grep /\.txt$/ && -f, @_ }, wanted => sub {}, }, shift); }
      (Actually, I'm thinking I'll go submit a patch so that find doesn't require a wanted in case a preprocess and/or postprocess is given.)

      Makeshifts last the longest.

      I ran my tests on a fairly slow Win95 PC, and came out with File::Find being slightly faster. All in all, for a simple find like this, I might use File::Find::Rule anyway (and it would have even more overhead):
      my @files = File::Find::Rule->file->name(*.txt)->in($dir);

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://225657]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2022-05-19 16:32 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (72 votes). Check out past polls.