Win has asked for the wisdom of the Perl Monks concerning the following question:

Can anyone suggest a Perl script that will check the number of files in a folder and delete the oldest two files, if there are more than 20 files present in the folder? Somebody has kindly given me the following bit of code:
opendir my $d, $dir; my @f = sort { -M $b <=> -M $a } readdir $d; unlink @f[-0,1] if @f > 20;
Can anyone please expand this? Perhaps a detailed explanation. Or suggestions for improvement.

Replies are listed 'Best First'.
Re: No. files in folder
by saskaqueer (Friar) on Jun 23, 2005 at 10:27 UTC
    opendir my $dir, $dir;

    This line opens a handle (which is saved as $d) to the directory found at path $dir (so $dir could be defined as '/home/myuser/thedir').

    my @f = sort { -M $b <=> -M $a } readdir $d;

    This is the sort of line that shows what perl can do. There's a couple of things going on here, so we can break it down to see what's happening. This statement basically works backwards. First the readdir($d) generates a list of files and directories being read from the $d directory handle. If the directory at $dir contains foo1.dat, foo2.dat, subDir1, subDir2, foo3.dat, then you will get those 5 items listed through readdir(). This list of files is then passed to the sort routine, which is the part enclosed in curly braces. The -M $b <=> -M $a sorts the list of files by the last modified time of the file -- the end result from this sort will be the list of files read from the readdir(), which we are placing in the @f array. Because of the way we've sorted the list, the oldest files will be at the front of the array, with the newest files at the end.

    unlink @f[0,1] if @f > 20;

    As I said earlier, the @f array now contains the list of files, with the oldest files listed at the front of the array (so elements 0 and 1 would be your oldest 2 files). So the conditional statement on the size of @f checks to see if we have more than 20 files, and if so, executes unlink() on the first 2 entries to have them deleted.

    I'd probably rewrite the code as follows to also ensure we are only dealing with files, just in case you have any directories in that main directory we do not want to include. Warning: my perl is quite rusty, there is a good chance the following code will not run, and if it does, it may not work as expected.

    my $dir = '/path/to/the/files'; opendir(my $d, $dir) or die("opendir() failed: $!"); my @sort_files = sort { -M($b) <=> -M($a) } grep { -f($_) } readdir($d); unlink(map { "$dir/$_" } @f[0, 1]) if (@sort_files > 20);
Re: No. files in folder
by broquaint (Abbot) on Jun 23, 2005 at 10:44 UTC
    Since it's my code I guess I ought comment on it:
    opendir my $d, $dir;
    This should really have an or die for sanity and a more descriptive variable name than $d such as $dir_handle.
    my @f = sort { -M $b <=> -M $a } readdir $d;
    This sorts all the files in the directory by modification time in reverse chronological order, again, a more descriptive variable like @file_list would be better suited. It should also only grab files so a grep should be added i.e
    my @file_list = sort { -M $b <=> -M $a } grep -f, readdir $dir_handle;
    And finally:
    unlink @f[-0,1] if @f > 20;
    This removes the two oldest files if there are more than 20 elements in @f. This code is somewhat broken as the pathnames are not fully qualified, which could be done like so:
    use File::Spec 'catfile'; unlink map catfile($dir, $_), @f[0,1] if @file_list > 20;
    Also the negated zero in the original code was just a blip from knocking out a quick bit of code in the CB. So, in summary, this more thought out code would be better:
    use File::Spec 'catfile'; opendir my $dir_handle, $dir or die "Couldn't open '$dir': $!"; my @file_list = sort { -M $b <=> -M $a } grep -f, readdir $dir_handle; unlink map catfile($dir, $_), @f[0,1] if @file_list > 20;


Re: No. files in folder
by blazar (Canon) on Jun 23, 2005 at 10:16 UTC
    That's fine. But it will only work your pwd is $dir. Otherwise you will have either to move there or prepend it to the filenames. Personally I think that while opendir is great and necessary in some circumstances, it is quite often abused where a simple glob (which does an opendir, and a readdir, and all the dirty jobs for you behind the scenes) would suffice.

    Update: incidentally -0 == 0. Probably the code originally had the sort the other way round and [-1,-2] as subscript.

      Well it's not quite fine, on some OS there is a chance that one could find oneself attempting to delete the current directory and it's parent ('.' and '..'). I would suggest something like:

      opendir my $d, $dir; + my @f = sort { -M $b <=> -M $a } map { "$dir/$_" } grep !/^\.{1,2}$/,readdir $d; + unlink @f[-0,1] if @f > 20;
      The only problem using the conditional like that on the unlink line is that it makes it more difficult to test whether the unlink succeeded - I would expand it to a full if block.


        Indeed, this is IMHO another good reason to use glob instead. I thought it is never supposed to return qw/. ../, or are there osen where it does?!? I mean glob '*' of course. One can trim it to {his,her} own needs anyway. And if really need be, than the cure would be (fundamentally) the same.
Re: No. files in folder
by borisz (Canon) on Jun 23, 2005 at 10:36 UTC
    Your script also look for directories. Maybe you want to add a grep.
    opendir my $d, $dir; # read dir, keep only files sort the files by modification time my @f = sort { -M $b <=> -M $a } grep { -f } readdir $d; # if we have more as 20 files remove the oldes two unlink @f[0,1] if @f > 20;
Re: No. files in folder
by tlm (Prior) on Jun 23, 2005 at 10:47 UTC

    The first line opens the folder (aka directory) for reading (though the syntax confuses perl v5.8.4; I think it's a bug in perl). The second line reads in all the files in the folder (including subfolders and, at least in Unix, the '.' and '..' "subfolders"!) and attempts to sort them decreasingly according to how long since each was last modified (as given by the -M operator). (Note that as written this works only if $dir is the current directory, because readdir gives only the filenames, not full paths). The last line attempts to delete (aka unlink) the oldest two files (or subfolders, etc.) if the total number of files and subfolders in the original folder exceeds 20. (I don't understand the purpose of the unary - operator before the 0.)

    I guess this is how I would improve on that code:

    opendir my $d, $dir or die "Can't readdir $dir: $!"; my @files = grep -f $_, # collect only the files in folde +r map "$dir/$_", readdir $d; # make full pathnames to the file +s closedir $d; # close directory handle # sort using a "Schwartzian Transform" my @sorted = map $_->[ 0 ], # untag sort { $b->[ 1 ] <=> $a->[ 1 ] } # sort according to age map [ $_, -M $_ ], # tag files with their "ages" @files; unlink @sorted[ 0, 1 ] if @sorted > 20;
    Though, admittedly, the "Schwartzian Transform" bit is probably overkill. The original approach you had:
    my @sorted = sort { -M $b <=> -M $a } @files;
    is probably adequate for most situations.

    Also, note that because of ties, in some cases there may not be a unique answer to the question "which are the two oldest files". The procedure above only deletes the first two files in an array of files (or rather filenames) sorted according to age since last modification.

    Update: Fixed typo: s/Sh/Sch/. Thanks, frodo72.

    the lowliest monk

      The Schwartzian Transform is obviously a requirement for this - what if a file is deleted and re-created in the middle of the sort? That would translate into inconsistent results - and you also make a lot less of OS calls.

      Of course, this doesn't address the problem of having the bunch of files changed after you've -M'ed them.

      Since this seems to be a live directory, you should be especially careful of what you are dropping there or you might shoot yourself in the foot.

      -- ank