Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

readdir inconsistent!

by LostShootingStar (Novice)
on Jun 27, 2007 at 17:41 UTC ( [id://623689]=perlquestion: print w/replies, xml ) Need Help??

LostShootingStar has asked for the wisdom of the Perl Monks concerning the following question:

Hello, Im trying to iterate LARGE file systems (millions and millions) of objects using opendir() and readdir(), and store the listing of files into a file. What i am finding is that my script is returning a different number of files every time i run it. For example, If i have a directory with 500,000 objects, and i run the script on this directory 3 times, i might get 498,976 then 497,098 or 499,543. I never get the same number of files twice. I know for a fact that the directory contents are not changing.
my @mntDirs = glob("/mnt/*/clips"); foreach $mntDir (@mntDirs){ my @shallowDirs = glob("$mntDir/*"); foreach $shallowDir (@shallowDirs) { my @deepDirs = glob ("$shallowDir/*"); foreach $deepDir (@deepDirs) { opendir (CUR, "$deepDir") or die; #skip . and .. readdir(CUR); readdir(CUR); while($ent = readdir(CUR)){ print "$ent\n"; } close CUR; } } }
This is running on a SLES 9 kernel, no NFS or anything. EDIT: to those who replied, i screwed up my original post, so i had to re-write it.

Replies are listed 'Best First'.
Re: readdir inconsistent!
by bluto (Curate) on Jun 27, 2007 at 19:17 UTC
    First, you probably shouldn't assume that '.' and '..' are the first two directory entries returned from readdir(). AFAIK, this isn't guaranteed, so you should just check for them individually.

    Second, are you mounting the filesystem over a network? For example, if you are using NFS make sure you test it with a hard mount or, if possible, test your code directly on the NFS server. NFS (and possibly other network filesystems) has to coordinate with the client on where in a directory it is reading from. I wouldn't be surprised if it's not exact on a huge directory since I've seen similar NFS bugs in the past.

    One way to see if this is a filesystem issue, rather than in your perl code, is to use the 'find' command and count the lines in the output to see if you get inconsistent results.

      I just tried the 'find' command suggestion, and found that the exact same thing is happening. different results each time. So i guess its not a perl issue, but if anyone has any suggestions im open too them. One thing i found is that if I run the script back to back to back, really fast, i get the same results. but if i wait a few min, or do other things on the system inbetween, the results are different. Again, im positive the directory contents are NOT changing.
        Again, im positive the directory contents are NOT changing.

        How do you back up that claim? Did you stat the '.' entry of the directories at each run? any differences?

        You might be hitting a bug of the underlying file system. What kind is it? xfs?

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: readdir inconsistent!
by Grundle (Scribe) on Jun 27, 2007 at 19:49 UTC
    If you are using ext3 there is a limit of 32k inodes per directory, however with Reiser and XFS it should not be an issue. Have 32k inodes in one directory is certainly crazy, and 500k is insanity. I would suggest organizing your data better (if my understanding of your system is correct).

    If you want to debug this script, run it over a directory with 10 files and see what your results are. I think you are running into file system wierdness.
Re: readdir inconsistent!
by moritz (Cardinal) on Jun 27, 2007 at 18:57 UTC
    As a debugging suggestion: run the script twice, and then sort and diff the two resulting files.

    Perhaps you can find patterns of what files are detected in one of the runs, and not in the other.

    BTW which kind of file systems are you using? Any network filesystems?

Re: readdir inconsistent!
by jettero (Monsignor) on Jun 27, 2007 at 17:48 UTC
    It feels to me like the directory is being modified while you iterate through it. Out of curiosity, what platform is it? That may make a difference...

    -Paul

      It is on a SLES 9 kernel, unix platform. I am 100% positive that the contents of the directorys are NOT being modified.
Re: readdir inconsistent!
by ysth (Canon) on Jun 28, 2007 at 05:25 UTC
    Have you tried sorting and comparing your outputs from different runs to see if knowing what the actual differences are sheds any light?
Re: readdir inconsistent!
by zentara (Archbishop) on Jun 28, 2007 at 16:38 UTC
    UPDATE: Fixed lost first line, thnx BrowserUk

    Works for me with reiserfs. The first time run it with 'c' as an argument, then run it repeatedly and I get 500002 everytime. (Remember the . and .. )

    #!/usr/bin/perl use warnings; use strict; use File::Path; if( (defined $ARGV[0]) and ($ARGV[0] eq 'c')) { my $dir = 'dir'; mkpath($dir) || die "no permission to $dir?"; foreach my $filenum (1..500000){ open FH, "> ./dir/$filenum" or warn "$!\n"; close FH; } } chdir "dir" or die "Couldn't chdir $!\n"; opendir(D, ".") or die "Couldnt open $!\n"; my $count = 0; while (my $file = readdir D) { $count++; } print "$count\n";

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://623689]
Approved by jettero
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-25 19:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found