Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

find "x" quantity newest files in a dir

by braintoast (Novice)
on Jun 27, 2001 at 12:37 UTC ( #91857=perlquestion: print w/replies, xml ) Need Help??

braintoast has asked for the wisdom of the Perl Monks concerning the following question:

I have a feeling this is either going to be easy or impossible.

I need to:
-read a dir
-find the ten newest files in it
-delete any others

I'm relatively sure that what I have below will remove anything older than 10 days, but I need to keep *10 days worth* of files. (Since this'll be run by cron daily, I have a problem on weekends or any other day when nothing is added to the directory..I fall below my mark of retaining the 10 *most recent* files.)

#!/usr/local/bin/perl -w use strict; use vars qw($archivedir @subdirs $subdirs); $archivedir = "/web/1/someplace" ; # READ CONTENTS OF DIRECTORY AND DELETE FILES OLDER THAN 10 DAYS opendir (ARCHDIR, "$archivedir") || die "Couldn't access the directory +!"; @subdirs = readdir(ARCHDIR); foreach $subdirs (@subdirs) { if (-M "$subdirs" > 10) { rmdir("$archivedir/$subdirs"); } } closedir ARCHDIR;

Ideas? Feel free to berate me if this turns out to be easy or my code looks silly. I'm still have tons to learn and welcome *any* advice.


Replies are listed 'Best First'.
Re: find "x" quantity newest files in a dir
by grinder (Bishop) on Jun 27, 2001 at 13:04 UTC

    Use the file names as hash keys, and assign the age of the file to the hash value. Then sort the hash by value, and skip over the first ten. The rest you can blow away or otherwise do as you please. Something like this should get you started:

    #! /usr/bin/perl -w use strict; my $directory = shift || '.'; opendir D, $directory or die "Cannot open directory $directory: $!\n"; my %age; while( defined( my $file = readdir D )) { next if $file eq '.' or $file eq '..'; my $canonical = "$directory/$file"; $age{$canonical} = (stat $canonical)[9]; } closedir D; my $count = 0; foreach my $file ( sort {$age{$b} <=> $age{$a}} keys %age ) { next if ++$count < 10; print "$file (@{[scalar localtime $age{$file}]})\n"; # or unlink $file if you are feeling brave }

    update: Oops! I had the equality operator around the wrong way. As things stood, this code would have unlinked the ten most recent files and kept the rest! It's now around the right way. (Tip o' the hat to tomhukins). Also changed the code to reflect the sane comments of tachyon below.

    g r i n d e r

      Nice post grinder ++ A couple of problems with your code though. First you need to specify the full path to the files for stat - outside of the ./ dir your code returns the dreaded 1970 date for everything. Second you use $count = 0; next if $count++ > 10 which saves the first 11 files as $count is 0 to start (File 1), 10 at file 11 (still not > 10) and 11 at file 12. You need either >=10 or ++$count. No doubt this relates to the reversal of the inequality. I have supplied some reassuring test code below which lists the files with date and shows what will get unlinked. You also need the full path for the unlink.



      use strict; my $directory = 'c:/windows'; opendir D, $directory or die "Cannot open directory $directory: $!\n"; my %age; while( my $file = readdir D ) { next if $file eq '.' or $file eq '..'; $age{"$directory/$file"} = (stat "$directory/$file")[9]; } closedir D; my $count = 0; foreach my $file ( sort {$age{$b} <=> $age{$a}} keys %age ) { if ($count++ < 10) { print "Newest: $file (@{[scalar localtime $age{$file}]})\n"; } else { print "For Delete: $file (@{[scalar localtime $age{$file}]})\n +"; # unlink $file; # uncomment at your own risk! } }
Re: find "x" quantity newest files in a dir
by Aighearach on Jun 27, 2001 at 13:39 UTC
    perl -e '@sorted = sort { -M $a <=> -M $b } </path/*>; unlink foreach +@sorted[10..$#sorted]'

    Snazzy tagline here
      You might find this rather expensive if you have 100 files or so, but if the directory is constantly being trimmed to 10 or 15 files, that's probably OK. Just in case someone copies your code for another task, it's nice to know how it scales.

      -- Randal L. Schwartz, Perl hacker

      unlink accepts a list, so you might say:
      unlink @sorted[10..$#sorted] if @sorted > 10;
Re: find "x" quantity newest files in a dir
by clemburg (Curate) on Jun 27, 2001 at 14:24 UTC

    I found this to be an interesting exercise in Date::Pcalc (this is the Perl only version of Date::Calc).

    This demo script uses File::Find to go through a given directory and prints out all files older than a given date limit, in business days, that is, without counting Saturdays or Sundays (but still without legal holidays - note that can be done, too, with the Date::Pcalc holiday calendar feature).

    Run like this for demo: perl -v 10

    #!/usr/bin/perl -w use strict; use File::Find; use Date::Pcalc qw(:all); use Getopt::Std; my %opts = (); getopts('d:l:v' , \%opts); die "\nUsage: $0 [-d startdirectory] [-v verbose] age_limit_in_days \n +\n" unless @ARGV; my $start_dir = $opts{d} || "."; my $verbose = $opts{v} || 0; my $day_limit = $ARGV[0] || "10"; find(\&wanted, $start_dir); sub wanted { my $filename = $_; my $age = (stat $filename)[8]; print "Processing file : " . $filename, "\n" if $verbose; return unless is_older_than($age, $day_limit); # put your delete code here - to dangerous to actually do it in de +mo script print "Would delete $filename, it's older than $day_limit business + days ...\n\n"; } sub is_older_than { my ($age, $days) = @_; my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localti +me($age); # months start at 1 in Date::Pcalc and at 0 in localtime $mon++; # year is base 1900 in localtime, base 0 in Date::Pcalc $year += 1900; if ($verbose) { print "Today is : " . join ' ', Today(), "\n"; print "File last accessed : " . join ' ', $year, $mon, $mday, "\n"; print "File age in days : " . Delta_Days($year, $mon, $mday, Today()), "\n"; print "File age in business days: " . Delta_Business_Days($year, $mon, $mday, Today()), "\n"; } return 1 if Delta_Business_Days($year, $mon, $mday, Today()) > $da +ys; return 0; } # From the Date::Pcalc manual # 15) How can I calculate the difference in days between dates, but # without counting Saturdays and Sundays? sub Delta_Business_Days { my(@date1) = (@_)[0,1,2]; my(@date2) = (@_)[3,4,5]; my($minus,$result,$dow1,$dow2,$diff,$temp); $minus = 0; $result = Delta_Days(@date1,@date2); if ($result != 0) { if ($result < 0) { $minus = 1; $result = -$result; $dow1 = Day_of_Week(@date2); $dow2 = Day_of_Week(@date1); } else { $dow1 = Day_of_Week(@date1); $dow2 = Day_of_Week(@date2); } $diff = $dow2 - $dow1; $temp = $result; if ($diff != 0) { if ($diff < 0) { $diff += 7; } $temp -= $diff; $dow1 += $diff; if ($dow1 > 6) { $result--; if ($dow1 > 7) { $result--; } } } if ($temp != 0) { $temp /= 7; $result -= ($temp << 1); } } if ($minus) { return -$result; } else { return $result; } } # NOTE THIS: # This solution is probably of little practical value, # however, because it doesn't take legal holidays into # account.

    Christian Lemburg
    Brainbench MVP for Perl

      Actually holidays are not so easy to take into account.

      First of all holidays vary by nation. And by state within nations.

      Secondly holidays change periodically for various reasons. Therefore definitive lists tend not to stay very definitive.

      Thirdly what happens with holidays that land on weekends? Typically different companies will handle these differently. In financial markets there is no hard and fast rule, and the decision about what to do often comes within 12 months of the actual holiday.

      Fourth there are a variety of marginal holidays that are handled differently. A given day may be a holiday in the bond markets but not in the stock market. Or vice versa. Thanksgiving Friday in the US is officially not a holiday, but at many places is. And so on.

      Fifth there is the question of religious holidays. For instance NYU medical school (which has a huge Jewish influence) routinely takes various Jewish holidays off but did not take off Easter. However most places in the US take off Easter but not the Jewish holidays.

      And so on. The closer you look at it, the messier the idea of a "business day" becomes...

        That's why Solaris has /etc/acct/holidays, so that the system administrator can define which days are working days and which aren't for the coming year, for their business.

Re: find "x" quantity newest files in a dir
by azatoth (Curate) on Jun 27, 2001 at 13:46 UTC
    This does what you're looking for : it's from File Attributes in Win32. All you need to do is change the -A to -M or whatever. Feel free to hack the code as much as you like.

    #!perl -w use strict; print "Enter Directory Path : \n"; chomp(my $dirPath = <STDIN>); opendir (DIR, "$dirPath") || die "Fscked Filesystem: $!\n"; my @fileList = grep(!/^\.\.?$/, readdir (DIR)); closedir (DIR); foreach my $file (@fileList) { check_files($file,$dirPath); } ## SUBS ## sub check_files { my $file = shift; my $dirPath = shift; if (-A $file > 30) { print "$file has not been used for over 30 days. De +lete? [y/n]\n"; chomp(my $ans = <STDIN>); if ($ans =~ /^y/) { unlink ("$dirPath\\$file") || die "Could Not + Remove $file : $!\n"; print "Deleting $dirPath\\$file...\n"; } else { print "Skipping File...\n"; } } }
    Update: Actually this doesn't do what you're looking for, but the code will help you on your way...

    Azatoth a.k.a Captain Whiplash

    Make Your Die Messages Full of Wisdom!
    Get YOUR PerlMonks Stagename here!
    Want to speak like a Londoner?
Re: find "x" quantity newest files in a dir
by Beatnik (Parson) on Jun 27, 2001 at 12:45 UTC
    try unlink to delete files... rmdir wont delete recursivly, and definitly not files.

    ... Quidquid perl dictum sit, altum viditur.
Re: find "x" quantity newest files in a dir
by mr.nick (Chaplain) on Jun 27, 2001 at 19:55 UTC
    I'll add my belated $0.02 here ... if you want to grab the 10 most recent files in a directory, try something like:
    my $dir="."; my @files=(map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ -M $_, $_ ] } glob("$dir/*"))[0..9];
    Then you can simply do whatever operation you want on @files. If you want all the files EXCEPT the newest ten, you can invert the sort logic and play with splice ala:
    my $dir="."; my @files=(map { $_->[1] } sort { $b->[0] <=> $a->[0] } map { [ -M $_, $_ ] } glob("$dir/*")); splice (@files,-10);

    mr.nick ...

(tye)Re: find "x" quantity newest files in a dir
by tye (Sage) on Jun 27, 2001 at 21:25 UTC

    If you can have a truely huge number of files, then the best way I know to do this is to use a heap. A heap is a partially sorted binary tree that can be stored in a fixed-length array where inserts are O(log(heap size)). You read through the list of files inserting them into a heap of size 10 which will always hold the 10 newest files so far. If you go to insert the next file and find that it isn't in the top 10, then it doesn't get inserted into the heap so you delete it.

    When you are done you have only the 10 newest files remaining in the directory. You can then pull the files out of the heap in sorted order (and you've done a "heap sort"), though your problem doesn't appear to have much use for this last step.

    There are a couple of modules on that implement heaps in Perl. Unfortunately, I haven't played with them enough to know if they are versatile enough for this type of work, but I'd be happy to hear back if someone else knows.

    I mention this not so much because I think it will be a good choice in this specific case. It is just a great way of solving this type of problem if you really need it to scale well (and is one of my personal favorites).

            - tye (but my friends call me "Tye")
Re: find "x" quantity newest files in a dir
by mattr (Curate) on Jun 27, 2001 at 17:03 UTC
    Like, I *love* Date::Manip, that's just me I lllllike it. But if you don't have holidays inputted correctly (especially if you aren't in the U.S.) or hey they added another holiday, well you only have 9 files.

    So why not just keep 13 or 14 files? Is each one so huge that you are going to wreck your partition? Laziness is a virtue in Perl and saves you time too. I'm going to stick to that and not post any code! :)

    Also, I would say don't worry so much about dates. You have a process that is generating a file a day, *if* it's running. Count how many files you have in the directory once a day and unlink the last one. You might even be able to parse the file names to be sure, because what if someone touches one, or you do a recovery from backup. If these are important files look at what is the easiest way to ensure your integrity.

    The easiest way I can think of doing that if you are in control of file naming is to name files in sequential order, sort the file names, and determine the next file number. Perhaps you could even use time(). Otherwise if you are really sure you want to get rid of the oldest file, -M or stat will do the age checking part (in unix anyway, don't quote me for Windows).

Re: find "x" quantity newest files in a dir
by thatguy (Parson) on Jun 28, 2001 at 00:59 UTC
    I saw this node this morning and I thought right off that find(1) did something like this natively.. it does catch the files last accessed or created within a time frame and execute a command, but I couldn't figure out how to delete everything that was older than the 10 newest. (for the curious it's: find ./ -type f -atime +7 -exec rm {} \;)

    but hey, since you can tack commands into find..
    find ./ -type d -exec perl -e '$limit="11";$dir="{}";$i=0;@list=`ls -ltF $dir`;print "Directory: $dir\n";foreach(@list) {chomp;$ls="$_";if (($ls=~ m/\//o) ne 1) {if (($i < $limit) && (($ls=~ m/total/o) ne 1) && (($ls=~ m/@/o) ne 1)) {print "keep $ls\n";} elsif ($i > ($limit-1)) {print "rm $ls\n";};} else {$i=($i-1);};$i++;};' {} \;
    ooogly, eh? and yeah, it's set up for testing.. no actual rm'ing going on.. but that's easy enough to fix.
    this was my fun activity for today.

      Or just:
      find . -type f | perl -nle 'print -M, "\t$_"' | sort -n | \ + tail +10 | cut -f 2 | xargs rm
      This assumes there are no filenames with newlines in them.

      -- Abigail

        hmm.. well, dang. mine started out so small!

        I am only now begining to see the fine art of perl and I think it's going to be a long time before I get away from the bludgeoning I do now.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://91857]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2020-07-07 13:28 GMT
Find Nodes?
    Voting Booth?

    No recent polls found