Beginners guide to File::Find

Traversing the directory tree

It is often needed to traverse all files in some directory tree recursively - similarly to what the Unix "find" command does, in Perl. It is possible to do so the "hard way", using opendir, readdir and their friends. But in Perl, naturally, TMTOWTDI. Not only I want to present an "other way to do it", but IMHO a "better way to do it", especially for beginners who only need to perform simple tasks.

File::Find basics

Just remember - if you have to traverse files recursively and do some processing on them, this is your friend:

use File::Find;
[download]

This module makes recursive file traversal as easy as you could imagine. The following is a naked template for working with this module:

use File::Find;

my $dir = # whatever you want the starting directory to be

find(\&do_something_with_file, $dir);

sub do_something_with_file
{
    #.....
}
[download]

First, a starting directory is initialized in $dir. If you imagine the directory structure as a tree, this is the root, from which the search starts.
Then, find (a function from the File::Find module) is called. It is given a reference to a subroutine and the starting directory. find will traverse the directory tree and call the supplied subroutive on each file (be it just a file, a directory, a link, etc).
Then we see the definition of the processing function. It gets one argument (stored in $_), the file currently seen by find. Consider the following simple example (it prints the names of all directories, starting with "." - the current directory):

use File::Find;

find(\&print_name_if_dir, ".");

sub print_name_if_dir
{
    print if -d;
}
[download]

Here, the subroutine print_name_if_dir is given as an argument to find. It simply prints the name of the file if it's a directory. Note the peculiar notation... It's customary in Perl not to mention $_, so:

print if -d;
[download]

Is equivalent to:

print $_ if -d $_;
[download]

Both are quite cryptic (but hey, it's Perl), and for clarity the routine could be rewritten as:

sub print_name_if_dir
{
    my $file = $_;

    print $file if -d $file;
}
[download]

Routines in Perl can be anonymous, which is more suitable for such simple tasks, so the whole program may be rewritten as:

use File::Find;

my $dir = # whatever you want the starting directory to be

find(sub {print if -d}, $dir);
[download]

Just 3 lines of code, and we're already doing something useful !

For the more advanced

The internal variable $File::Find:name can be used at any time to report the full path to the file. Consider the following improved version of our little script:

use File::Find;

find(sub {print $File::Find::name if -d}, ".");
[download]

Try running it and compare the results to the previous version. You will notice that it prints the full path to the directory. What happens is the following - Find::File chdirs into each directory it finds in its search, and $_ gets only the short name (w/o path) of the file, while Find::File::name gets the full path. If, for some reason, you don't want it to chdir, you may specify no_chdir as a parameter. Parameters to find are passed as a hash reference:

use File::Find;

find({wanted => sub {print $File::Find::name if -d}
      no_chdir => 1}, ".");
[download]

Note that "wanted" is the key for the file processing routine in this hash.
The results won't differ from the previous version. Here, however, $_ will also be the full path to a file, because find doesn't "dive into" the directories.
Other parameters may be specified (like 'bydepth' if want a depth-first-search), but these are advanced topics. If you're curious, you can look these issues up in the documentation of the module.

Bonus - a useful utility based on File::Find

Ever felt that your quota suffocates you, and couldn't find the unnecessary large files to remove ? Do you find "du" too tedious to use in these cases ? File::Find comes to the rescue. Consider the following script... It takes a starting directory, and prints the 20 largest files found in the tree under this directory - specifying full paths, so you can just cut-n-paste them into "rm":

#!/usr/local/bin/perl -w

($#ARGV == 0) or die "Usage: $0 [directory]\n"; 

use File::Find;
    
find(sub {$size{$File::Find::name} = -s if -f;}, @ARGV);
@sorted = sort {$size{$b} <=> $size{$a}} keys %size;
    
splice @sorted, 20 if @sorted > 20;
    
foreach (@sorted) 
{
    printf "%10d %s\n", $size{$_}, $_;
}
[download]

What goes on here ? find traverses the given directory recursively, taking notice of each file's size in the $size hash table (-s if -f means = get the size if this is a file). Then, it sorts the hash table by size, and prints the 20 largest files. That's it... I use this utility quite a lot to clean space, I hope you find it useful too (and also understand exactly how it works !)

Update:

Thanks to rinceWind for this:
File::Find is cross-platform. It's one of the really handy ways for iterating directory trees on Windows - something Microsoft don't encourage you to do, with their 'hidden files' (File::Find X-rays through Windows hidden files mechanism nicely :-).

With this in mind, though, you must be careful when working with Windows' paths, because slashes there have a different direction. There is a nice tutorial - Paths in Perl, that explains this.

Update 2:

There are some nice continuation replies written to this tutorial - special thanks to Aristotle, who supplied some info for the real advanced use of File::Find.

Conclusion

File::Find can turn the tasks dealing with recursive file traversal from torture to pleasure, if you know how to use it. Modules like this make Perl a wonderful language it is - you can perform useful tasks without pain. Enjoy !

Edit by tye to add READMORE

Back to Tutorials