Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Directory Recursion

by count0 (Friar)
on Jan 05, 2002 at 09:19 UTC ( [id://136482]=perltutorial: print w/replies, xml ) Need Help??

Overview

Often times you find yourself in a situation where you need to delve deep into directories, their subdirectories, the subdirectories' subdirectories, and so on and so forth.

This is a great example of where recursion comes into play. It allows you to write a single routine which can call itself again and again, as many times as needed.

There is a very notable module that helps with this task, and that is File::Find. This is an incredibly useful module and should always be used for any production code. It takes care of many of the nuances in file processing, such as the handling of symbolic links, hard-link counts, and so on.

This tutorial, on the other hand, is designed to give you a basic understanding of recursion in Perl, and should hopefully be beneficial to you for more than just file and directory processing (though that will be the focus). After reading this, I hope that you will be able to look at a recursive file processing routine (be it with File::Find or otherwise), and have a very clear understanding of what it does, and how it does it.

Some Conventions

For the sake of clarity, here are a few conventions used in this tutorial:
  • 'path' - Refers to the filesystem path of a file. For example, the path to '/home/count0/filename' would be '/home/count0'.
  • 'file' - Any type of file, including directories.

The (pseudo-code) Algorithm

For recursive processing, where a returned list of files is not needed:
(One example may be to rename all or certain files)
process_files() with the base path as 'path' process_files(): get a list of all files in 'path' for each of the files if it is not a directory and it needs processing process it if it is a directory process_files() with this dir as 'path'

If a returned list of files is needed:
(Note that this can be made to do processing as well)
list_of_all_files = process_files() with the base path as 'path' process_files(): get a list of all files in 'path' for each of the files if it is not a directory process it if necessary add it to our list of files if it is a directory process_files() with this dir as 'path' add the files returned from process_files() to our lis +t of files return our list of files

The Code

First, we'll make a very basic example. In it, we will not be returning any lists of files, but simply doing processing on each.

process_files ($base_path); # Accepts one argument: the full path to a directory. # Returns: nothing. sub process_files { my $path = shift; # Open the directory. opendir (DIR, $path) or die "Unable to open $path: $!"; # Read in the files. # You will not generally want to process the '.' and '..' files, # so we will use grep() to take them out. # See any basic Unix filesystem tutorial for an explanation of the +m. my @files = grep { !/^\.{1,2}$/ } readdir (DIR); # Close the directory. closedir (DIR); # At this point you will have a list of filenames # without full paths ('filename' rather than # '/home/count0/filename', for example) # You will probably have a much easier time if you make # sure all of these files include the full path, # so here we will use map() to tack it on. # (note that this could also be chained with the grep # mentioned above, during the readdir() ). @files = map { $path . '/' . $_ } @files; for (@files) { # If the file is a directory if (-d $_) { # Here is where we recurse. # This makes a new call to process_files() # using a new directory we just found. process_files ($_); # If it isn't a directory, lets just do some # processing on it. } else { # Do whatever you want here =) # A common example might be to rename the file. } } }

That was a bare-bones template for how you will process files recursively.
But what if you want to return a list of all files in a directory and all of its subdirectories?

Building on the previous one, this example will go through a directory and each of its subdirectories and compile a list of all the files in them.
process_files ($base_path); # Accepts one argument: the full path to a directory. # Returns: A list of files that reside in that path. sub process_files { my $path = shift; opendir (DIR, $path) or die "Unable to open $path: $!"; # We are just chaining the grep and map from # the previous example. # You'll see this often, so pay attention ;) # This is the same as: # LIST = map(EXP, grep(EXP, readdir())) my @files = # Third: Prepend the full path map { $path . '/' . $_ } # Second: take out '.' and '..' grep { !/^\.{1,2}$/ } # First: get all files readdir (DIR); closedir (DIR); for (@files) { if (-d $_) { # Add all of the new files from this directory # (and its subdirectories, and so on... if any) push @files, process_files ($_); } else { # Do whatever you want here =) .. if anything. } } # NOTE: we're returning the list of files return @files; }

Real Example

Just for the sake of completeness, and to help you get started writing recursive routines to suit your needs, here is an example of a recursive function that actually does something. As you'll see, I have used a few common shortcuts and idioms which make it look different than the above examples.. but hopefully you will now be able to read this with confidence.
# Accepts one argument: the full path to a directory. # Returns: A list of files that end in '.html' and have been # modified in less than one day. sub get_new_htmls { my $path = shift; my $ONE_DAY = 86400; # seconds opendir (DIR, $path) or die "Unable to open $path: $!"; my @files = map { $path . '/' . $_ } grep { !/^\.{1,2}$/ } readdir (DIR); # Rather than using a for() loop, we can just # return a directly filtered list. return grep { (/\.html$/) && (time - (stat $_)[9] < $ONE_DAY) && (! -l $_) } map { -d $_ ? get_new_htmls ($_) : $_ } @files; }

UPDATE Per merlyn's comments, made it even more clear that this is not intended to be used as production code.
Added symlink ignoring to real example.

Replies are listed 'Best First'.
Re: Directory Recursion
by rob_au (Abbot) on Jan 06, 2002 at 04:32 UTC
    An interesting comment which I thought I might make is the fact that recent versions of File::Find, well at least that shipped with Perl 5.6.1, no longer use recursion (in the truest sense of the term) in order to generate a list of files. By recursion in the truest sense of the term, I refer to a single block of re-entrant code that is called for each new iteration through a directory - Instead, newer versions of File::Find make use of a stack method (search newer File::Find code for @Stack within the _find_dir subroutine) which allows for the processing of directories either by depth first or by breadth first, depending on where you expect files to reside.

    A comparison of the File::Find code between versions 5.005.03 and 5.6.1 highlights a number of differences between recursion and stack processing which may be interesting to some.

     

    perl -e 's&&rob@cowsnet.com.au&&&split/[@.]/&&s&.com.&_&&&print'

        I'm a few years late to the party - but you are iterating over an array that you are extending, (@files) not sure what the party line is on this, but it caused me grief. Create a temporary array, then append the two together at the end. Then throw it all away and use File::Find.
Stop reinventing the wheel! (was Re: Directory Recursion)
by merlyn (Sage) on Jan 05, 2002 at 20:18 UTC
    I downvoted this. It creates non-portable code by reinventing a fine portable wheel, and it doesn't pay attention to symlinks, and will therefore recurse infinitely if a symlink happens to point to an upper directory, and it has the common bug that ignores the files dot-newline and dot-dot-newline.

    Please, please, stop showing how to "cargo cult" recursive directory until you have studied in detail why every part of File::Find is in there. Besides the three bugs I just mentioned, there's probably even bugs I can't see at the moment!

    -- Randal L. Schwartz, Perl hacker

      I think the author made it clear that this wasn't a File::Find replacement here:

      There is a very notable module that helps with this task, and that is File::Find. While this is an incredibly useful module, this tutorial is designed to give you a basic understanding of recursion in Perl, and should hopefully be beneficial to you for more than just file and directory processing (though that will be the focus).

      He seems to be pitching this as a learning exercise. Your node pointing out some bugs adds to the learning experience.

      Perhaps it should be made more clear that it isn't production usable code, but...

      Lighten up. ( /me braces for the -- onslaught :)

      Update: I agree with merlyn in general, but I think a gentle suggestion that the author update the node with mention of the bugs and fixes would work just fine. Perhaps it would be even more helpful for a beginner to learn from someone else's common mistakes ? After all, this is a forum.

        The problem is that I consider it unethical to teach something without knowing it well enough not to introduce well-known bugs. It's just bad for the community at large to propogate bad memes under the banner of "helping", because it doesn't help, it hurts.

        It's the equivalent to spreading rumors that could harm someone else's reputation without first verifying them from an independent source, which I also consider to be unethical. Why do you think newspapers have such strict rules on "fact checking"?

        -- Randal L. Schwartz, Perl hacker


        update: in response to
        Update: I agree with merlyn in general, but I think a gentle suggestion that the author update the node with mention of the bugs and fixes would work just fine. Perhaps it would be even more helpful for a beginner to learn from someone else's common mistakes ? After all, this is a forum.
        I would have no problem if the original node had been posted "in quotes", as in "I'm thinking of writing a tutorial, can someone review the following draft before I publish it...". But that wasn't done. It was posted as a done deal, and thus pushed my hotbutton.
      It creates non-portable code by reinventing a fine portable wheel

      Firstly, I appreciate you pointing this out... It shows me that I was not adequately clear in my intent with this tutorial.

      However, even at skimming the (original) first 3 paragraphs, it is clearly stated that I'm not reinventing any wheels!
      This node is more of a "how-it-works" as opposed to a "how-to".

      ...until you have studied in detail why every part of File::Find is in there. Besides the three bugs I just mentioned, there's probably even bugs I can't see at the moment!

      Please, before you hasten to nitpick the details of implementation, step back from your Perl-guru perspective and look at this from the standpoint of a newer programmer looking to understand something like File::Find.
      To detail every part of that module, and why it's in there, would defeat the entire purpose of this tutorial. If my goal were to teach people to rewrite from scratch a new File::Find, I would simply cut and paste its source here (and in response to "until you have studied..", I have, and am quite dismayed at such an assumption).
      That would be entirely counter-productive!! To a newer programmer trying to understand directory (or any) recursions, it is a difficult enough task simply trying to comprehend the what, why, and how of it. To be burdoned by the many exceptions, catches, "look out for"'s, and so on.... would be discouraging, to say the least.
        If you had merely mentioned "but this has a bug with respect to symlinks" and "but this regex is not correct for dot-newline and dot-dot-newline", your argument would be much more sound. But you didn't. So you picked an example that while at first seems like a good example to demonstrate recursion, is actually a known dangerous territory. This is my gripe. I guess the title of my rant isn't "stop reinventing the wheel", but "when you reinvent the wheel to teach, be sure you teach proper things!"

        As for ...

        To a newer programmer trying to understand directory (or any) recursions, it is a difficult enough task simply trying to comprehend the what, why, and how of it. To be burdoned by the many exceptions, catches, "look out for"'s, and so on.... would be discouraging, to say the least.
        ... that's exactly what programming is all about! Especially with recursion, you must consider things like "will this ever terminate" and "am I looking at the right data" and "what can go wrong". Those cannot be overlooked, simply sweeping them under the rug.

        The basics of recursion can be illustrated with something like factorial, which has easily defined end-points and recognizable computations. The flaws of recursion can be illustrated with something like the fibonacci calculator. But even these examples require at least a footnote to say "note that we aren't verifying that the number is an integer here", in my book anyway. Why do you think we have so many footnotes in the llama? I say one sweeping statement in the text, but in the footnote, I'm able to more carefully specify it so that I don't outright lie.

        -- Randal L. Schwartz, Perl hacker

      you are perfectly rite it doesn't take into account the symbolic links and the permissions of a file lets say if we wanna copy. The other thing i wanna point out this is not reusable at all
Re: Directory Recursion
by Beatnik (Parson) on Jan 05, 2002 at 16:11 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perltutorial [id://136482]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-04-18 02:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found