Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

RE: Descending through directories

by t0mas (Priest)
on May 30, 2000 at 17:27 UTC ( [id://15389]=note: print w/replies, xml ) Need Help??


in reply to Descending through directories
in thread Getting a List of Files Via Glob

The code I posted was to demonstrate a way to recursive call self, not being very useful in itself... ;-)
There is more than one way to do it. You could rewinddir and readdir again (with a different grep) on the same handle too..
But I agree that your solution is beautiful.
Maybe someone could benchmark some testcases.

/brother t0mas

Replies are listed 'Best First'.
RE: RE: Descending through directories
by Corion (Patriarch) on May 30, 2000 at 19:54 UTC
    I've never used perlfunc:rewinddir() :), and my post wasn't meant as an offence, sorry if I came across that way ...
      Did some benchmarking today. I really like knowing the most effective way to solve a certain problem and when Corion posted his code above, I got curious. To Corion, I would like to say that this is no "I'm right - you're wrong" kind of thing. I've enjoyed your code (since I love and use eConsole) for a long time, and I really didn't know which of the ways that was most effective, so please don't take this the wrong way.
      If someone else have ideas about this please, give it a shot with your own code.
      I use directory travering quite often so I would really be glad to be able to use the most effective code in my programs.
      Here we go:
      use Benchmark; use File::Spec; use File::Find; $t0 = new Benchmark; &t1('C:\\Program'); $t1 = new Benchmark; &t2('C:\\Program'); $t2 = new Benchmark; &t3('C:\\Program'); $t3 = new Benchmark; &t4('C:\\Program'); $t4 = new Benchmark; &t5('C:\\Program'); $t5 = new Benchmark; print "t1: ",timestr(timediff($t1, $t0)),"\n"; print "t2: ",timestr(timediff($t2, $t1)),"\n"; print "t3: ",timestr(timediff($t3, $t2)),"\n"; print "t4: ",timestr(timediff($t4, $t3)),"\n"; print "t5: ",timestr(timediff($t5, $t4)),"\n"; # Opens a dirhandle to read files, another to read sub-dirs and # recursive calls itself foreach subdir it finds sub t1 { my $Dir = shift; opendir(DIR, $Dir) || die "Can't opendir $Dir: $!"; my @Files = grep { /.txt/ && -f "$Dir/$_" } readdir(DIR); closedir DIR; opendir(DIR, $Dir) || die "Can't opendir $Dir: $!"; my @Dirs = grep { /^[^.].*/ && -d "$Dir/$_" } readdir(DIR); closedir DIR; foreach $file (@Files) { print $Dir."-".$file."\n"; } foreach $SubDir (@Dirs) { &t1(join("\\",$Dir,$SubDir)); } }; # Opens a dirhandle to read files, rewinds to read sub-dirs and # recursive calls itself foreach subdir it finds sub t2 { my $Dir = shift; opendir(DIR, $Dir) || die "Can't opendir $Dir: $!"; my @Files = grep { /.txt/ && -f "$Dir/$_" } readdir(DIR); rewinddir(DIR); my @Dirs = grep { /^[^.].*/ && -d "$Dir/$_" } readdir(DIR); closedir DIR; foreach $file (@Files) { print $Dir."-".$file."\n"; } foreach $SubDir (@Dirs) { &t2(join("\\",$Dir,$SubDir)); } }; # Opens a dirhandle to read all directory contents and # recursive calls itself foreach subdir it finds # Uses File::Spec, which makes it portable sub t3 { my ($Dir) = shift; my ($entry,@direntries,$fullpath); opendir( DIR, $Dir ) or die "Can't opendir $Dir: $!"; @direntries = readdir( DIR ) or die "Error reading $Dir : $!\n"; closedir DIR; foreach $entry (@direntries) { next if $entry =~ /^\.\.?$/; $fullpath = File::Spec->catfile( $Dir, $entry ); if (-d $fullpath ) { &t3($fullpath); } elsif ( -f $fullpath && $entry =~ /.txt/) { print $Dir."-".$entry."\n"; } } }; # Opens a dirhandle to read all directory contents and # recursive calls itself foreach subdir it finds sub t4 { my ($Dir) = shift; my ($entry,@direntries,$fullpath); opendir( DIR, $Dir ) or die "Can't opendir $Dir: $!"; @direntries = readdir( DIR ) or die "Error reading $Dir : $!\n"; closedir DIR; foreach $entry (@direntries) { next if $entry =~ /^\.\.?$/; $fullpath = join("\\",$Dir,$entry); if (-d $fullpath ) { &t4($fullpath); } elsif ( -f $fullpath && $entry =~ /.txt/) { print $Dir."-".$entry."\n"; } } }; # Uses File::Find (whatever it does...) sub t5 { my ($Dir) = shift; find(\&found, $Dir); } sub found { /.txt/ && print $File::Find::dir."-".$_."\n"; }
      This test was run on a Pentiun 233 with 128Mb RAM, Windows 2000, FAT32 filesystem
      C:\\Program holds 13477 files in 1206 folders of which 137 matches *.txt

      t1: 27 wallclock secs ( 8.40 usr + 16.76 sys = 25.17 CPU)
      t2: 24 wallclock secs ( 7.69 usr + 15.57 sys = 23.26 CPU)
      t3: 47 wallclock secs (20.30 usr + 23.85 sys = 44.15 CPU)
      t4: 36 wallclock secs (11.04 usr + 23.33 sys = 34.37 CPU)
      t5: 30 wallclock secs (11.12 usr + 18.02 sys = 29.13 CPU)


      /brother t0mas

        I've just run your program (with slight modifications) under Linux on a dual SMP P2-350 machine, on my home directory, whose subdirectories contain about 20 text files and quite a lot (about 500MB) of html files in several directories. The results amazed me. So I did run this test four times in a row, and the last three results were identical but really amazing :

        t1:  7 wallclock secs ( 2.43 usr +  4.27 sys =  6.70 CPU)
        t2:  7 wallclock secs ( 2.43 usr +  4.32 sys =  6.75 CPU)
        t3: 14 wallclock secs ( 8.25 usr +  5.73 sys = 13.98 CPU)
        t4:  7 wallclock secs ( 1.62 usr +  4.77 sys =  6.39 CPU)
        t5:  1 wallclock secs ( 0.84 usr  0.21 sys +  0.00 cusr  0.01 csys =  0.00 CPU)
        

        The trend we can see is, that everything is faster in general, about the factor 3 or 4, but what really is amazing is, how little time &t5(); takes, only 1 wallclock second. So I did interchange &t4() and &t5() to see if that result was order dependant :

        ...
        t4:  1 wallclock secs ( 0.95 usr  0.18 sys +  0.00 cusr  0.01 csys =  0.00 CPU)
        t5:  7 wallclock secs ( 1.75 usr +  4.65 sys =  6.40 CPU)
        

        But it wasn't. This is really strange and sheds some new light on File::Find which I always considered clumsy, and which is one of the slower routines under Win32. Wonders of Perl :).

        To see how the results would change, I then reran your test for files that match .html (while going through the source code, there were some things with your regular expressions - the ".txt" RE will match anything consisting of at least four letters with "txt" not at the start and the directory matching will leave out directories which start with a "." (so unix "hidden" directories will not be searched). I ran the test three times and threw away the first test results on about 500 MB of html files.

        t1:  8 wallclock secs ( 2.59 usr +  4.65 sys =  7.24 CPU)
        t2:  8 wallclock secs ( 2.47 usr +  4.66 sys =  7.13 CPU)
        t3: 17 wallclock secs ( 8.65 usr +  5.90 sys = 14.55 CPU)
        t4:  9 wallclock secs ( 1.67 usr +  5.42 sys =  7.09 CPU)
        t5:  2 wallclock secs ( 1.04 usr  0.23 sys +  0.00 cusr  0.01 csys =  0.00 CPU)
        

        And amazingly, the trend continues, with &t5() beating the rest by far, even though I had thought the whole results should have become console bound anyway, but that wasn't so.

        I wonder what my tests under NT 4 will bring us :)

        I finally got off my lazy back and ran the test on my home machine, a trusty P-100 with 80 MB RAM, and here are the results (with ActivePerl 5.005_03 build 517):

        FAT 16 drive (no HD activity during the second run)
        t1: 17 wallclock secs ( 6.66 usr +  9.89 sys = 16.55 CPU)
        t2: 16 wallclock secs ( 5.89 usr +  8.47 sys = 14.36 CPU)
        t3: 41 wallclock secs (16.67 usr + 18.16 sys = 34.83 CPU)
        t4: 27 wallclock secs ( 8.37 usr + 16.88 sys = 25.26 CPU)
        t5: 15 wallclock secs ( 7.75 usr +  7.07 sys = 14.82 CPU)
        NTFS drive (slight HD activity for the later parts of the HD)
        t1: 96 wallclock secs (30.07 usr + 59.09 sys = 89.17 CPU)
        t2: 87 wallclock secs (27.73 usr + 53.18 sys = 80.91 CPU)
        t3: 179 wallclock secs (72.02 usr + 96.92 sys = 168.94 CPU)
        t4: 142 wallclock secs (36.63 usr + 96.15 sys = 132.78 CPU)
        t5: 81 wallclock secs (35.33 usr + 43.25 sys = 78.58 CPU)
        

        So here File::Find is again on par with the solution reading any directory twice and the solution using rewinddir(), and my favourite method of doing stuff, &t4 dosen't look that good either if you are going for peak performance. The fastest solution takes only half the time, and scanning the whole NTFS HD did take some time as you see :). So once again the rule number one of optimizing holds. Benchmark, benchmark, benchmark.

        Hello t0mas !

        It always amazes me at which places I find users of eConsole - never would I have thought to find a user on perlmonks :) !

        Thanks for doing these tests - I didn't even know there was a Benchmark module ! What amazes me is, that the method of reading a directory twice (as done in t1 and t2) is faster than reading it once and checking for file/directory afterwards - you never stop learning I guess ... I will run these tests on my machine (a lowly P-100 running NT 4) and maybe on a Linux machine as well to get a more complete view of the behaviour :)

        This code is AWESOME!
        Hello t0mas !

        It always amazes me at which places I find users of eConsole - never would I thought to find a user on perlmonks :) !

        Thanks for doing these tests - I didn't even know there was a Benchmark module ! What amazes me is, that the method of reading a directory twice (as done in t1 and t2) is faster than reading it once and checking for file/directory afterwards - you never stop learning I guess ... I will run these tests on my machine (a lowly P-100 running NT 4) and maybe on a Linux machine as well to get a more complete view of the behaviour :)

      No offence taken. I think its a good thing to discuss/show different ways to solve the same problem, and I guess we all have our own toolkits of code snippets that we throw into every program we write.
      Maybe I'll try to benchmark some of the ways when I'll find some time.

      /brother t0mas

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://15389]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-19 05:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found