http://qs321.pair.com?node_id=828135

bichonfrise74 has asked for the wisdom of the Perl Monks concerning the following question:

Given the output from the code below, how would I get the final subdirectories for a certain directory?
#!/usr/bin/perl use strict; use File::Find; find( sub { print "$File::Find::name \n" if -d }, '/tmp/a' );
Below is the output from the code above.
/tmp/a /tmp/a/b /tmp/a/b/c /tmp/a/b/c/e /tmp/a/b/d /tmp/a/b/d/g /tmp/a/b/d/g/h /tmp/a/b/d/g/i /tmp/a/b/k
But the output that I want is
/tmp/a/b/c/e /tmp/a/b/d/g/h /tmp/a/b/d/g/i /tmp/a/b/k
So, for example, I do not need the '/tmp/a' because it is still has sub-directories underneath it.

Thanks in advance.

Replies are listed 'Best First'.
Re: How to Get the Last Subdirectories
by rubasov (Friar) on Mar 11, 2010 at 22:06 UTC
    This code is probably a little too tricky, however if I'm right, it just does what it needs to.
    #! /usr/bin/perl use 5.010; use strict; use warnings; use File::Find; my $dir = $ARGV[0] // q{.}; sub deepest { return if not -d; state $prev_dir = ''; say if index $prev_dir, $_; $prev_dir = $_; } find( { wanted => \&deepest, no_chdir => 1, bydepth => 1, }, $dir );
    The main idea behind this code is the following: if you traverse your directory structure in depth-first order, then you only need to check whether your current directory path is not a prefix (a slice starting at position 0) of the previous directory path. If it's not a prefix of the previous value, then print it.

    To ease the understanding: your sample dir structure in depth-first order:

    a/b/d/g/h a/b/d/g/i a/b/d/g # this is a prefix of the previous, so we don't want it a/b/d a/b/k # this is not a prefix of the previous, so we want it a/b/c/e a/b/c a/b a

      ++, nice idea.  Here's a variation of your approach, making use of the postprocess option:

      #!/usr/bin/perl -l use File::Find; my @dirs; find( { wanted => sub {}, postprocess => sub { push @dirs, $File::Find::dir if index $dirs[-1]||"", $File::Find::dir; }, }, '/tmp/a' ); print for @dirs;
Re: How to Get the Last Subdirectories
by toolic (Bishop) on Mar 11, 2010 at 21:36 UTC
    One brute-force method is:
    • Store all your paths into an array.
    • Two nested loops through the array
    • Create a hash of parent directories, using index.
    • Loop through your hash and print just the leaf directories.
    use strict; use warnings; my @dirs; while (<DATA>) { chomp; push @dirs, $_; } my %parents; for my $dir1 (@dirs) { for my $dir (@dirs) { if (index($dir1, $dir) == 0) { # $dir1 is a substring of $dir, starting at pos 0 $parents{$dir}++; } } } for (keys %parents) { print "$_\n" if $parents{$_} == 1 } __DATA__ /tmp/a /tmp/a/b /tmp/a/b/c /tmp/a/b/c/e /tmp/a/b/d /tmp/a/b/d/g /tmp/a/b/d/g/h /tmp/a/b/d/g/i /tmp/a/b/k
    Prints:
    /tmp/a/b/d/g/h /tmp/a/b/d/g/i /tmp/a/b/k /tmp/a/b/c/e
    I used this technique to solve a similar problem. Hopefully, our fellow monks will provide a more elegant solution.
Re: How to Get the Last Subdirectories
by liverpole (Monsignor) on Mar 11, 2010 at 21:47 UTC
    Hi bichonfrise74,

    I would just use opendir and readdir recursively to scan the subdirectories yourself, and save the subdirectory only when it doesn't contain any subordinate subdirectories.

    For example:

    use strict; use warnings; use FileHandle; my $h_dirs = terminal_subdirs("/tmp/a"); my @dirs = sort keys %$h_dirs; print "Terminal Directories:\n", join("\n", @dirs); sub terminal_subdirs { my ($top, $h_results) = @_; $h_results ||= { }; my $fh = new FileHandle; opendir($fh, $top) or die "Arrggghhhh -- can't open '$top' ($!)\n" +; my @files = readdir($fh); closedir $fh; my $nsubdirs = 0; foreach my $fn (@files) { next if ($fn eq '.' or $fn eq '..'); my $full = "$top/$fn"; if (!-l $full and -d $full) { ++$nsubdirs; terminal_subdirs($full, $h_results); } } $nsubdirs or $h_results->{$top} = 1; return $h_results; }

    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
Re: How to Get the Last Subdirectories
by FunkyMonk (Chancellor) on Mar 11, 2010 at 22:27 UTC
    use Data::Dump 'pp'; my @dirs = qw( tmp/a tmp/a/b tmp/a/b/c tmp/a/b/c/e tmp/a/b/d tmp/a/b/d/g tmp/a/b/d/g/h tmp/a/b/d/g/i tmp/a/b/k ); my %empty; for (@dirs) { $empty{$_} = 1; # assume empty s!/[^/]+$!!; # find parent delete $empty{$_}; # and remove it } pp keys %empty; __END__ ("tmp/a/b/k", "tmp/a/b/d/g/h", "tmp/a/b/c/e", "tmp/a/b/d/g/i")


    Unless I state otherwise, all my code runs with strict and warnings
Re: How to Get the Last Subdirectories
by sigma8 (Initiate) on Mar 11, 2010 at 22:55 UTC

    I don't think there is a way to do it inline in the File::Find subroutine without recursion, but if you can wait until the end, I think this should do it:

    #!/usr/bin/perl use strict; use File::Find; my %seen; find( sub { if (-d) { $seen{$File::Find::name}++; delete $seen{$File:: +Find::dir} }; }, '/tmp/a' ); print join "\n", (keys %seen, undef);
    This creates a hash key for every directory, and deletes every key that is the same name as the parent. Therefore directories who have no children will never get deleted.
Re: How to Get the Last Subdirectories
by admiral_grinder (Pilgrim) on Mar 12, 2010 at 20:01 UTC
    What the hell, here is my stab at this. It doesn't work when it comes across a unreadable object such as 'C:\System Volume Information', but that might be a issue in the Path::Class::Dir than my code.
    #!perl use strict; use warnings; use Path::Class; #file(), dir() use Cwd; #getcwd() my $start_dir = dir( $ARGV[0] || getcwd() ); #print "DEBUG: Scanning $start_dir\n"; $start_dir->recurse( callback => \&report_leaf_dirs ); sub report_leaf_dirs { my $object = shift; #print "DEBUG: processing $object\n"; return unless $object->is_dir(); # Test to see if we can read it unless( $object->open() ) { warn "Unable to open $object\n"; return; } # Test for sub directories foreach my $child ( $object->children() ) { return if $child->is_dir(); } print "$object\n"; }
Re: How to Get the Last Subdirectories
by pemungkah (Priest) on Mar 12, 2010 at 00:50 UTC
    I started by setting up a sample set of test directories:
    [mcmahon@joe-desk ~]$ ls -R ./example/ ./example/: file nonempty_files_only nonempty_has_dirs ./example/nonempty_files_only: file1 file2 ./example/nonempty_has_dirs: file1 one two ./example/nonempty_has_dirs/one: ./example/nonempty_has_dirs/two:
    That's a directory containing files and other (nonempty) directories, one containing only files, and one containing a file and two empty directories.
    sub dive { my($d) = shift; return if ! -d $d; my @contents = glob("$d/*"); return $d unless @contents; my @below = map { dive($_) } @contents; return @below ? @below # Stuff below qualifies, this doesn't : $d; # Nothing below qualifies, this does } $d = './example'; print join ", ", dive($d),"\n";
    This prints
    ./example/nonempty_files_only, ./example/nonempty_has_dirs/one, ./exam +ple/nonempty_has_dirs/two
    The tricky bit is postponing the decision about whether the current directory is good until you've seen if any subdirectories of it qualify.

    Edit: Removed the majority of the comments as they were actually obscuring how short this is; renamed @queue as it was a leftover from a previous, longer, iterative version.

Re: How to Get the Last Subdirectories
by Anonymous Monk on Mar 11, 2010 at 21:05 UTC
    What have you tried?
      Unfortunately, I have not tried anything yet. It's more like I am just thinking of how I should do it. Below is the idea that I was thinking.

    • Create a recursive hash and store each subdirectory as a key.
    • Loop through the recursive hash and place the 'directory' in a new hash. So, if the key is already in the new hash, then continue to the next one.

      But the above idea might be making things complicated. I was wondering if there is a simple approach to this or I may just be over complicating the problem.
        You are basically interested in those directories in your tree which have no subdirectories; so the following algorithm should work:
        1. Initially create an empty hash %leaf_directories
        2. Whenever File::Find drops you into a directory $d, do the following:
          1. Remove the parent directory of $d from the hash, i.e. if $d contains the full path, do a delete $leaf_directories{dirname($d)}. Of course this will fail occasionally (because there is no corresponding entry), but we ignore this.
          2. Add $d to your hash, i.e. $leaf_directories{$d}=1
        In the end, keys $leaf_directories should be the list of the directories without subdirectories.

        -- 
        Ronald Fischer <ynnor@mm.st>