Re: find "x" quantity newest files in a dir
by grinder (Bishop) on Jun 27, 2001 at 13:04 UTC
|
Use the file names as hash keys, and assign the age of the file to the hash value. Then sort the hash by value, and skip over the first ten. The rest you can blow away or otherwise do as you please. Something like this should get you started:
#! /usr/bin/perl -w
use strict;
my $directory = shift || '.';
opendir D, $directory or die "Cannot open directory $directory: $!\n";
my %age;
while( defined( my $file = readdir D )) {
next if $file eq '.' or $file eq '..';
my $canonical = "$directory/$file";
$age{$canonical} = (stat $canonical)[9];
}
closedir D;
my $count = 0;
foreach my $file ( sort {$age{$b} <=> $age{$a}} keys %age ) {
next if ++$count < 10;
print "$file (@{[scalar localtime $age{$file}]})\n";
# or unlink $file if you are feeling brave
}
update: Oops! I had the equality operator around the wrong way. As things stood, this code would have unlinked the ten most recent files and kept the rest! It's now around the right way. (Tip o' the hat to tomhukins). Also changed the code to reflect the sane comments of tachyon below.
-- g r i n d e r
| [reply] [Watch: Dir/Any] [d/l] |
|
Nice post grinder ++ A couple of problems with your code though. First you need to
specify the full path to the files for stat - outside of the ./ dir your code returns the dreaded 1970 date for everything.
Second you use $count = 0; next if $count++ > 10 which saves the first 11 files as $count is 0 to start (File 1), 10 at file 11 (still not > 10)
and 11 at file 12. You need either >=10 or ++$count. No doubt this relates to the reversal of the inequality.
I have supplied some reassuring test code below which lists the files with date and shows what will get unlinked. You also need the full path for
the unlink.
cheers
tachyon
use strict;
my $directory = 'c:/windows';
opendir D, $directory or die "Cannot open directory $directory: $!\n";
my %age;
while( my $file = readdir D ) {
next if $file eq '.' or $file eq '..';
$age{"$directory/$file"} = (stat "$directory/$file")[9];
}
closedir D;
my $count = 0;
foreach my $file ( sort {$age{$b} <=> $age{$a}} keys %age ) {
if ($count++ < 10) {
print "Newest: $file (@{[scalar localtime $age{$file}]})\n";
}
else {
print "For Delete: $file (@{[scalar localtime $age{$file}]})\n
+";
# unlink $file; # uncomment at your own risk!
}
}
| [reply] [Watch: Dir/Any] [d/l] |
Re: find "x" quantity newest files in a dir
by Aighearach (Initiate) on Jun 27, 2001 at 13:39 UTC
|
perl -e '@sorted = sort { -M $a <=> -M $b } </path/*>; unlink foreach
+@sorted[10..$#sorted]'
-- Snazzy tagline here
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
|
unlink accepts a list, so you might say:
unlink @sorted[10..$#sorted] if @sorted > 10;
| [reply] [Watch: Dir/Any] [d/l] |
Re: find "x" quantity newest files in a dir
by clemburg (Curate) on Jun 27, 2001 at 14:24 UTC
|
I found this to be an interesting exercise in Date::Pcalc
(this is the Perl only version of Date::Calc).
This demo script uses File::Find to go through a given
directory and prints out all files older than a given
date limit, in business days, that is, without
counting Saturdays or Sundays (but still without legal
holidays - note that can be done, too, with the Date::Pcalc
holiday calendar feature).
Run like this for demo: perl older_than_days.pl -v 10
#!/usr/bin/perl -w
use strict;
use File::Find;
use Date::Pcalc qw(:all);
use Getopt::Std;
my %opts = ();
getopts('d:l:v' , \%opts);
die "\nUsage: $0 [-d startdirectory] [-v verbose] age_limit_in_days \n
+\n"
unless @ARGV;
my $start_dir = $opts{d} || ".";
my $verbose = $opts{v} || 0;
my $day_limit = $ARGV[0] || "10";
find(\&wanted, $start_dir);
sub wanted {
my $filename = $_;
my $age = (stat $filename)[8];
print "Processing file : " . $filename, "\n"
if $verbose;
return unless is_older_than($age, $day_limit);
# put your delete code here - to dangerous to actually do it in de
+mo script
print "Would delete $filename, it's older than $day_limit business
+ days ...\n\n";
}
sub is_older_than {
my ($age, $days) = @_;
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localti
+me($age);
# months start at 1 in Date::Pcalc and at 0 in localtime
$mon++;
# year is base 1900 in localtime, base 0 in Date::Pcalc
$year += 1900;
if ($verbose) {
print "Today is : " .
join ' ', Today(), "\n";
print "File last accessed : " .
join ' ', $year, $mon, $mday, "\n";
print "File age in days : " .
Delta_Days($year, $mon, $mday, Today()), "\n";
print "File age in business days: " .
Delta_Business_Days($year, $mon, $mday, Today()), "\n";
}
return 1 if Delta_Business_Days($year, $mon, $mday, Today()) > $da
+ys;
return 0;
}
# From the Date::Pcalc manual
# 15) How can I calculate the difference in days between dates, but
# without counting Saturdays and Sundays?
sub Delta_Business_Days
{
my(@date1) = (@_)[0,1,2];
my(@date2) = (@_)[3,4,5];
my($minus,$result,$dow1,$dow2,$diff,$temp);
$minus = 0;
$result = Delta_Days(@date1,@date2);
if ($result != 0)
{
if ($result < 0)
{
$minus = 1;
$result = -$result;
$dow1 = Day_of_Week(@date2);
$dow2 = Day_of_Week(@date1);
}
else
{
$dow1 = Day_of_Week(@date1);
$dow2 = Day_of_Week(@date2);
}
$diff = $dow2 - $dow1;
$temp = $result;
if ($diff != 0)
{
if ($diff < 0)
{
$diff += 7;
}
$temp -= $diff;
$dow1 += $diff;
if ($dow1 > 6)
{
$result--;
if ($dow1 > 7)
{
$result--;
}
}
}
if ($temp != 0)
{
$temp /= 7;
$result -= ($temp << 1);
}
}
if ($minus) { return -$result; }
else { return $result; }
}
# NOTE THIS:
# This solution is probably of little practical value,
# however, because it doesn't take legal holidays into
# account.
Christian Lemburg
Brainbench MVP for Perl
http://www.brainbench.com | [reply] [Watch: Dir/Any] [d/l] [select] |
|
Actually holidays are not so easy to take into account.
First of all holidays vary by nation. And by state within nations.
Secondly holidays change periodically for various reasons.
Therefore definitive lists tend not to stay very definitive.
Thirdly what happens with holidays that land on weekends? Typically different companies will handle these differently. In financial markets there is no hard and fast rule, and the decision about what to do often comes within 12 months of the actual holiday.
Fourth there are a variety of marginal holidays that are handled differently. A given day may be a holiday in the bond markets but not in the stock market. Or vice versa. Thanksgiving Friday in the US is officially not a holiday, but at many places is. And so on.
Fifth there is the question of religious holidays. For instance NYU medical school (which has a huge Jewish influence) routinely takes various Jewish holidays off but did not take off Easter. However most places in the US take off Easter but not the Jewish holidays.
And so on. The closer you look at it, the messier the idea of a "business day" becomes...
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
Re: find "x" quantity newest files in a dir
by azatoth (Curate) on Jun 27, 2001 at 13:46 UTC
|
This does what you're looking for : it's from File Attributes in Win32. All you need to do is change the -A to -M or whatever. Feel free to hack the code as much as you like.
#!perl -w
use strict;
print "Enter Directory Path : \n";
chomp(my $dirPath = <STDIN>);
opendir (DIR, "$dirPath") || die "Fscked Filesystem: $!\n";
my @fileList = grep(!/^\.\.?$/, readdir (DIR));
closedir (DIR);
foreach my $file (@fileList)
{
check_files($file,$dirPath);
}
## SUBS ##
sub check_files
{
my $file = shift;
my $dirPath = shift;
if (-A $file > 30)
{
print "$file has not been used for over 30 days. De
+lete? [y/n]\n";
chomp(my $ans = <STDIN>);
if ($ans =~ /^y/)
{
unlink ("$dirPath\\$file") || die "Could Not
+ Remove $file : $!\n";
print "Deleting $dirPath\\$file...\n";
} else {
print "Skipping File...\n";
}
}
}
Update: Actually this doesn't do what you're looking for, but the code will help you on your way...
Azatoth a.k.a Captain Whiplash
Make Your Die Messages Full of Wisdom!
Get YOUR PerlMonks Stagename here!
Want to speak like a Londoner? | [reply] [Watch: Dir/Any] [d/l] |
Re: find "x" quantity newest files in a dir
by Beatnik (Parson) on Jun 27, 2001 at 12:45 UTC
|
try unlink to delete files... rmdir wont delete recursivly, and definitly not files.
Greetz
Beatnik
... Quidquid perl dictum sit, altum viditur. | [reply] [Watch: Dir/Any] |
Re: find "x" quantity newest files in a dir
by mr.nick (Chaplain) on Jun 27, 2001 at 19:55 UTC
|
I'll add my belated $0.02 here ... if you want to grab the 10 most recent files in a directory, try something like:
my $dir=".";
my @files=(map { $_->[1] }
sort { $a->[0] <=> $b->[0] }
map { [ -M $_, $_ ] }
glob("$dir/*"))[0..9];
Then you can simply do whatever operation you want on @files. If you want all the files EXCEPT the newest ten, you can invert the sort logic and play with splice ala:
my $dir=".";
my @files=(map { $_->[1] }
sort { $b->[0] <=> $a->[0] }
map { [ -M $_, $_ ] }
glob("$dir/*"));
splice (@files,-10);
mr.nick ...
| [reply] [Watch: Dir/Any] [d/l] [select] |
(tye)Re: find "x" quantity newest files in a dir
by tye (Sage) on Jun 27, 2001 at 21:25 UTC
|
If you can have a truely huge number of files, then the best way I know to do this is to use a heap. A heap is a partially sorted binary tree that can be stored in a fixed-length array where inserts are O(log(heap size)). You read through the list of files inserting them into a heap of size 10 which will always hold the 10 newest files so far. If you go to insert the next file and find that it isn't in the top 10, then it doesn't get inserted into the heap so you delete it.
When you are done you have only the 10 newest files remaining in the directory. You can then pull the files out of the heap in sorted order (and you've done a "heap sort"), though your problem doesn't appear to have much use for this last step.
There are a couple of modules on http://search.cpan.org/ that implement heaps in Perl. Unfortunately, I haven't played with them enough to know if they are versatile enough for this type of work, but I'd be happy to hear back if someone else knows.
I mention this not so much because I think it will be a good choice in this specific case. It is just a great way of solving this type of problem if you really need it to scale well (and is one of my personal favorites).
-
tye
(but my friends call me "Tye")
| [reply] [Watch: Dir/Any] |
Re: find "x" quantity newest files in a dir
by mattr (Curate) on Jun 27, 2001 at 17:03 UTC
|
Like, I *love* Date::Manip, that's just me I lllllike it.
But if you don't have holidays inputted correctly (especially
if you aren't in the U.S.) or hey they added another holiday,
well you only have 9 files.
So why not just keep 13 or 14 files? Is each one so huge that
you are going to wreck your partition? Laziness is a virtue
in Perl and saves you time too. I'm going to stick to that and
not post any code! :)
Also, I would say don't worry so much
about dates. You have a process that is generating a file
a day, *if* it's running. Count how many files you have
in the directory once a day and unlink the last one. You might
even be able to parse the file names to be sure, because
what if someone touches one, or you do a recovery from
backup. If these are important files look at what is the
easiest way to ensure your integrity.
The easiest way I
can think of doing that if you are in control of file naming
is to name files in sequential order, sort the file names,
and determine the next file number. Perhaps you could even
use time(). Otherwise if you are really sure you want
to get rid of the oldest file, -M or stat will do the age checking
part (in unix anyway, don't quote me for Windows). | [reply] [Watch: Dir/Any] |
Re: find "x" quantity newest files in a dir
by thatguy (Parson) on Jun 28, 2001 at 00:59 UTC
|
I saw this node this morning and I thought right off that find(1) did something like this natively.. it does catch the files last accessed or created within a time frame and execute a command, but I couldn't figure out how to delete everything that was older than the 10 newest. (for the curious it's: find ./ -type f -atime +7 -exec rm {} \;)but hey, since you can tack commands into find.. find ./ -type d -exec perl -e '$limit="11";$dir="{}";$i=0;@list=`ls -ltF $dir`;print "Directory: $dir\n";foreach(@list) {chomp;$ls="$_";if (($ls=~ m/\//o) ne 1) {if (($i < $limit) && (($ls=~ m/total/o) ne 1) && (($ls=~ m/@/o) ne 1)) {print "keep $ls\n";} elsif ($i > ($limit-1)) {print "rm $ls\n";};} else {$i=($i-1);};$i++;};' {} \;
ooogly, eh? and yeah, it's set up for testing.. no actual rm'ing going on.. but that's easy enough to fix. this was my fun activity for today. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
find . -type f | perl -nle 'print -M, "\t$_"' | sort -n | \
+
tail +10 | cut -f 2 | xargs rm
This assumes there are no filenames with newlines in them.
-- Abigail
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |