Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

RE: WARNING t0mas wrote BAD CODE

by t0mas (Priest)
on Jun 15, 2000 at 02:54 UTC ( #18209=note: print w/replies, xml ) Need Help??

in reply to WARNING t0mas wrote BAD CODE
in thread Odd file rename

Hi merlyn. I did some benchmarking to see how opendir/readdir/closedir performed compared to File::Find.
use Benchmark; use File::Find; my $dirSep = $^O eq "MSWin32" ? "\\" : "/"; my $fileCounter=0; $t0 = new Benchmark; &test1('Program'); $t1 = new Benchmark; print $fileCounter . "\n"; $fileCounter=0; $t2 = new Benchmark; &test2('Program'); $t3 = new Benchmark; print $fileCounter . "\n"; print "test1: ",timestr(timediff($t1, $t0)),"\n"; print "test2: ",timestr(timediff($t3, $t2)),"\n"; sub test1 { my $Dir = shift; opendir(DIR, $Dir) || die "Can't opendir $Dir: $!"; my @Files = grep { /\.txt$/ && -f "$Dir$dirSep$_" } readdir( +DIR); rewinddir(DIR); my @Dirs = grep { /^[^.].*/ && -d "$Dir$dirSep$_" && ! -l "$ +Dir$dirSep$_"} readdir(DIR); closedir DIR; foreach (@Files) { $fileCounter+=1; } foreach $SubDir (@Dirs) { &test1(join($dirSep,$Dir,$SubDir)) +; } }; sub test2 { my ($Dir) = shift; find(\&found, $Dir); } sub found { -f && /\.txt$/ && ($fileCounter+=1); }
The test-machine was a dual boot P233MHz/256RAM. The Program directory is on a FAT32 partition.

On Win2000 the results where:

test1: 52 wallclock secs (17.89 usr + 34.68 sys = 52.57 CPU)
test2: 50 wallclock secs (18.41 usr + 30.94 sys = 49.35 CPU)

On Redhat Linux 6.0 the results where:
test1: 14 wallclock secs ( 2.25 usr + 12.57 sys = 14.82 CPU)
test2: 14 wallclock secs ( 2.86 usr + 10.11 sys = 12.97 CPU)

Pretty even...

When I added a /\.txt/ && to &found and to the @Files = grep {} statement in &test1 the results changed a bit. The more complex regexp (regexp wise), the worse Find::File performed on Win32.
On Win2000 the new results where:

test1: 30 wallclock secs (10.17 usr + 18.79 sys = 28.95 CPU)
test2: 32 wallclock secs (12.19 usr + 19.76 sys = 31.95 CPU)

On Linux it depended on when the regexp was evaluated. If I put it before -f, it performed better than if I put it after.

I believe that the choise of method depends on the problem faced. My "every-day" environment is Win32 and in my opinion the File::Find has a simple and elegant syntax, but it performs worse than the opendir/readdir/closedir on my main environment.
So until File::Find performs better on Win32, you can call me stupid and cargo-cultist, but I guess I'll stick to opendir/readdir/closedir.
Please note that I don't want to start a war about this, I just wanted to clarify my opinions. I do agree that the code I posted had some issues.

/brother t0mas

Replies are listed 'Best First'.
Reinventing wheels based on bad benchmarks
by runrig (Abbot) on Jan 09, 2003 at 21:50 UTC
    my @Files = grep { /\.txt$/ && -f "$Dir$dirSep$_" } ... ... -f && /\.txt$/ && ($fileCounter+=1);
    You've made a big benchmarking error. You are comparing two very different, though seemingly equivalent, operations. In your first test, you first do a relatively inexpensive operation first which in most cases short-circuits the need to do the relatively expensive operation second (assuming there are relatively few '.txt' files).

    In the second test, you always perform the expensive operation (a file stat, or whatever the Windows equivalent is), which makes this an unfair comparison. Make the comparison fair and I think you'll be surprised.

    BTW, it would take more than the 4-6% improvement that you cite for me to reinvent this wheel anyway :-)

      I'm not surprised.

      If you read my post carefully you'll note that I wrote On Linux it depended on when the regexp was evaluated. If I put it before -f, it performed better than if I put it after..

      I've experimented a lot with this issue before making the post, and I've experimented a lot before making my original post which caused so much debate.

      The file stat has no _significant_ effect on the benchmark. No major impact as you say.

      You can try it yourself!

      The impact it have is on the first run only, since the second run is read from some file cache and become very in-expensive.

      I re-ran the benchmark today (my current box is a Pentium 1000, Windows 2000 with perl v5.6.1 built for MSWin32-x86-multi-thread) and included a find sub with no -f at all (the test3), hitting 1250 files:

      test1: 42 wallclock secs ( 9.21 usr + 31.04 sys = 40.26 CPU)
      test2: 52 wallclock secs (13.48 usr + 34.29 sys = 47.77 CPU)
      test3: 51 wallclock secs (13.14 usr + 34.33 sys = 47.47 CPU)


      I think the decicion to reinvent or not (in this case), depends on wether 4-6% is important or not. If 4-6% speed gain makes your program meet the specifications and fail otherwise, what then?.

      /brother t0mas
        If 4-6% speed gain makes your program meet the specifications and fail otherwise, what then?

        It's a tired argument/rebuttal, but if reinventing this wheel puts you inside the specifications, then chances are the time you spent reinventing would have been far better invested in some other part of the code that will likely gain you more than a mere 5% average improvement.

        Remember that crawling directories is a heavily I/O bound activity where optimizations in your code are unlikely to be able to make a great deal of difference.

        However, as a suggestion (I haven't benched it), try this:

        sub test4 { find({ preprocess => sub { $fileCounter += grep /\.txt$/ && -f, @_ }, wanted => sub {}, }, shift); }
        (Actually, I'm thinking I'll go submit a patch so that find doesn't require a wanted in case a preprocess and/or postprocess is given.)

        Makeshifts last the longest.

        I ran my tests on a fairly slow Win95 PC, and came out with File::Find being slightly faster. All in all, for a simple find like this, I might use File::Find::Rule anyway (and it would have even more overhead):
        my @files = File::Find::Rule->file->name(*.txt)->in($dir);

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://18209]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2022-05-17 12:13 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (65 votes). Check out past polls.