Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Recursion and Such

by Grundle (Scribe)
on Jan 31, 2005 at 17:46 UTC ( [id://426673]=perlquestion: print w/replies, xml ) Need Help??

Grundle has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a recursive file generation algorithm that will insert a certain upper bounded number of files into a directory before creating a new directory and continuing to create the files. I am running into a problem with the recursion however, when I try to run a subroutine within the algorithm. The subroutine being called
(findDir) seems to run correctly on the first iteration, but on successive iterations no matte what value being passed to it, it seems to be preserving the initial values that were passed to it on the first pass. Can anyone shed some insight as to why this behavior is occurring?
use File::Find; use Cwd; #recursive directory creation http://perlmonks.thepen.com/183899.html my $dir = "/tstDir"; my $level_one = 0; my $level_two = 0; my $source_num = 0; my $source = "TstRun"; my $MAX_FILES_IN_DIR = 8; my $file_pointer; my $curr_dir = depthFirst($dir, 1); chdir($curr_dir); for (my $i = 0; $i < 40; $i++){ if($file_pointer = $MAX_FILES_IN_DIR){ my $curr_dir = depthFirst($dir, 1); chdir($curr_dir); } open(WRITE_FILE, ">$i"); print WRITE_FILE "$i"; close(WRITE_FILE); $file_pointer++; } print "Directory: $curr_dir\n"; #my $depth = 4; #my $count = findDir("/tstDir/0/0/TstRun0", $depth, $depth); print "count: $count\n"; sub depthFirst{ my ($dir, $depth) = @_; my $file_count = 0; if($depth eq 4){ $dir = "/tstDir/$level_one/$level_two/$source$source_num"; print "depth: $depth dir: $dir\n"; if(-e "$dir"){ $file_count = findDir($dir, 4, 4); print "file count: $count_files\n"; if($file_count < $MAX_FILES_IN_DIR){ chdir($dir); $file_pointer = $file_count; return $dir; }else{ $source_num++; depthFirst($dir, 3); } }else{ system "mkdir -p $dir"; $file_pointer = 0; return $dir; } return; }elsif($depth eq 3){ $dir = "/tstDir/$level_one/$level_two/"; print "depth: $depth dir: $dir\n"; if(-e "$dir"){ $files_count = findDir($dir, 3, 3); if($file_count < $MAX_FILES_IN_DIR){ depthFirst($dir, ++$depth); }else{ $level_two++; depthFirst($dir, 2); } }else{ system "mkdir -p $dir"; depthFirst($dir, $depth); } }elsif($depth eq 2){ $dir = "/tstDir/$level_one/"; if(-e "$dir"){ print "depth: $depth dir: $dir\n"; $file_count = findDir($dir, 2, 2); if($file_count < $MAX_FILES_IN_DIR){ depthFirst($dir, ++$depth); }else{ $level_one++; depthFirst($dir, $depth); } }else{ system "mkdir -p $dir"; depthFirst($dir, $depth); } }else{ print "depth: $depth dir: $dir\n"; depthFirst("$dir", ++$depth); } } sub findDir{ my ($dir, $min_depth, $max_depth) = @_; print "findDir: $dir min: $min_depth max: $max_depth\n"; my $count_files = 0; find( { preprocess => \&preprocess, wanted => \&wanted, }, $dir); #print "count: $count_files\n"; sub preprocess { my $depth = $File::Find::dir =~ tr[/][]; #print "depth: $depth\n"; return @_ if $depth < $max_depth; print "depth: $depth max: $max_depth\n"; if ($depth == $max_depth){ print "greping\n"; return grep { -df } @_; } return; } sub wanted { my $depth = $File::Find::dir =~ tr[/][]; return if $depth < $min_depth; print "in wanted\n"; print "file: $_\n"; if(!($_ =~ m/^\./)){ $count_files++; } } return $count_files; }

Replies are listed 'Best First'.
Re: Recursion and Such
by Fletch (Bishop) on Jan 31, 2005 at 19:08 UTC

    Just a suggestion, but if your intent is to spread out files across a directory tree to improve access time (e.g. you've got an older Linux using ext2 that drags on directories with over ~1000 entries) or have an NFS server that doesn't allow over n bytes worth of filenames in an inode, you might make a tree of [[:hexdigit:]]+/[[:hexdigit:]]+/realfile, where the [[:hexdigit:]]+ chunks are derived using Digest::MD5 or the like from the realfile's name.

    (Just to toss that idea out as I've had to deal with both the problems I mentioned in the past . . .)

    Update: The OP contacted me out-of-band with questions about how exactly to apply this to their situation. I'll answer them here just in case anyone else was interested . . .

    The way I've always used this in the past is to basically use the "filename" as a key that gets run through MD5 to create the real on-disk filename. If you can regenerate the key easily (say it's a log file of the form "username-month-year") you don't really need to keep the original filenames around; otherwise you'll want to keep a list of just keys (possibly using DB_File, a real RDBMS, or a flat file) to use as a table of contents. Whichever way you go, what you want to write is a key2path( ) routine which you pass in the key and get back the real on disk path. This example does two levels deep (so you need code to make directories 00 .. ff and then 00/00..ff, 01/00..ff, ...) and uses the hashed key as the pathname (although there's no reason you couldn't use $key instead of $digest; the reason I didn't in the most recent case I used this was that it was the length of the original filename causing problems to begin with and the key was readily available externally):

    sub key2path { my $key = shift; my $digest = Digest::MD5::md5_hex( $key ); return substr( $digest, 0, 2 ) . "/" . substr( $digest, 2, 2 ) . "/ +" . $digest; }
      I never even thought of that. Great suggestion! BTW - you hit the nail on the head about the problem. NFS isn't the best when you start dealing with large file sets under one directory, so of course the solution is to spread the files out and preserve speed.
      In reply to the Update

      This seems to be a much simpler approach. It definitely saves on a lot of code, and takes out the confusing recursion. Thanks for offering a viable alternative that was in effect much easier to implement.
Re: Recursion and Such
by Roy Johnson (Monsignor) on Jan 31, 2005 at 18:52 UTC
    Quick tips
    • Use == for numeric equality, not eq.
    • Use the builtin mkdir, rather than system "mkdir..."
    • Consider whether your recursion ever bottoms out (it doesn't seem like it does)
    • use strict; use warnings;
      yield several alarming messages about variables that "will not stay shared"


    Caution: Contents may have been coded under pressure.

      Yup, $depth is probably getting set on the first invocation and never getting updated un subsequent invocations. See this article


      "The dead do not recognize context" -- Kai, Lexx
        Yes, this is exactly the problem! The article you sent me to got me on the right path, but I thought it was too brief, or at least it didn't offer enough examples for me to figure out what was happen. You are right, however, the root of my problem was exactly that.

        The following I thought was a very good article on the same thing, although their code is a little cryptic in some places. http://www.serverwatch.com/tutorials/article.php/10825_1128811_1

        Finally The following is the re-written fileDir sub-routine that works just fine (although I think it looks a little ugly to me).

        sub findDir{ my ($dir, $min, $max) = @_; $min_depth = \$min; $max_depth = \$max; #print "findDir: $dir min: $min_depth max: $max_depth\n"; $count_files = *count{0}; find( { preprocess => \&preprocess, wanted => \&wanted, }, $dir); #print "count: $count_files\n"; sub preprocess { my $depth = $File::Find::dir =~ tr[/][]; #print "depth: $depth\n"; return @_ if $depth < $$max_depth; print "depth: $depth max: $$max_depth\n"; if ($depth == $$max_depth){ print "greping\n"; return grep { -df } @_; } return; } sub wanted { my $depth = $File::Find::dir =~ tr[/][]; return if $depth < $$min_depth; if(!($_ =~ m/^\./)){ $$count_files++; } } return $$count_files; }
      I thought about using the mkdir() from perl, but my problem is that if a nested directory structure doesn't exist

      i.e. /foo/bar/foobar/

      then the code

      mkdir("/foo/bar/foobar/");

      is going to fail. Therefore I used a system call with the -p option so that any depth of directory will be created without failures.
      As for the recursion bottoming out, just run the code. If it runs inifinitely then it doesn't bottom out. If it exits out, then it has a base case.

      Thanks for the pointers.

        Check out File::Path - it has a function mkpath which solves that problem for you. Anything to avoid an unnecessary system call ;-) (Also note that this is part of the standard perl distribution so you can just use it anywhere without having to require extra modules.)

Re: Recursion and Such
by Roy Johnson (Monsignor) on Jan 31, 2005 at 18:34 UTC
    It sounds like you've been bit by a closure. I notice you're defining subroutines within findDir and calling find with references to those subroutines.

    Because those subroutines are named, they're defined once, and what they use for $max_depth et al is a matter of some consternation for me. I haven't played with that sort of deep magic, myself, so I can't give you a definite diagnosis, but I think your troubles are in that area.(Even though as I think about it, it seems like it should be ok.)

    Try defining the subs like

    my $wanted = sub { ...
    and passing those variables (which are subroutine references) to find. Note that you'll need to define the subroutines before the call to find.

    Caution: Contents may have been coded under pressure.
Re: Recursion and Such
by dragonchild (Archbishop) on Jan 31, 2005 at 17:56 UTC
    I'm going to go out on a limb and suspect that it's because File::Find uses globals that are not being reset correctly between iterations.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      That doesn't make sense since the line of code that is being affected is the following
      sub recursion{ findDir($dir, $min, $max); } sub findDir{ my ($dir, $min_depth, $max_depth) = @_; }
      granted that is a scaled down version, but for brevity's sake you get the idea. No matter what is being passed after $min and $max == 2 is not being updated in the findDir directory. It seems like the array @_ is holding on to old values, but even when I use
      $dir = shift; $max = shift; $min = shift; etc ...
      I still get the wrong data. This has to be some property of recursion that I am missing...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://426673]
Approved by gellyfish
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-19 04:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found