Finding oldest file in directory

Nitrox has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
•Re: Finding oldest file in directory by merlyn (Sage) on Oct 18, 2004 at 18:25 UTC
Probably only the teeniest bit faster to run, but a lot easier to type, I would have picked `-s $a` over `(stat $a)[10]` and so on. Also, above some number of files (depending on your OS efficiency), it'd be faster to cache your stats for the sort. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply. update: Darn it. I misread `[10]` as wanting the size, even though there were other clues in the message about wanting the oldest. OK, yes, replace `-s` there with `-M`.	[reply] [d/l] [select]
Re^2: Finding oldest file in directory by pg (Canon) on Oct 18, 2004 at 18:38 UTC
"it'd be faster to cache your stats for the sort" Depends on what does "oldest" mean, and how files are created, modified and removed from the directory. The catched info might not be correct and useful. It probably just increases the complexity of the program, with 10+ files in the directory, most likely not worth it.	[reply]
Re^3: Finding oldest file in directory by halley (Prior) on Oct 18, 2004 at 18:41 UTC
I think what Randal L. Schwartz was referring to, when he said "cache it for the sort," was to use a very common sort optimization technique called, not coincidentally, the Schwartzian Transform. `@sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, (-s $_) ] } @unsorted;` [download] -- `[ e d @ h a l l e y . c c ]`	[reply] [d/l]
Re^3: Finding oldest file in directory by Fletch (Bishop) on Oct 18, 2004 at 18:58 UTC
If the underlying files are changing quick enough that `-s` isn't going to return the same result you're probably already screwed (and I want to say that some `qsort` implementations might even core on you)).	[reply] [d/l]
Re^2: Finding oldest file in directory by bluto (Curate) on Oct 18, 2004 at 19:15 UTC
Do you mean '-C' instead of '-s' (file size)?	[reply]
Re: Finding oldest file in directory by Roy Johnson (Monsignor) on Oct 18, 2004 at 19:12 UTC
Finding the oldest/most recent/minimum/maximum of a list does not require sorting. For a 10-file directory, it's not a big deal, but the right tool for the job is a simple max-finder: `my $oldest; my $oldtime = 0; for (glob "$dir/.pl") { my $thistime = -C; if ($thistime > $oldtime) { ($oldest, $oldtime) = ($_, $thistime); } }` [download] You could do this at kind of the same programming level as you're trying to by using List::Util 'reduce': `use List::Util 'reduce'; my $file = (reduce {$a->[0] < $b->[0] ? $a : $b} map {[(stat)[10],$_]} glob '.pl' )->[1];` [download] That makes for somewhat complicated reading, though, and might be better broken into more steps. Update: for posterity: the map above is only useful for reducing the number of times stat is called, from 2N to N. The overhead of map and storing the values and dereferencing is probably not worth it. It's certainly simpler to say `my $file = reduce {(stat $a)[10] < (stat $b)[10] ? $a : $b} glob '.pl';` [download] Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]
Re: Finding oldest file in directory by TomDLux (Vicar) on Oct 18, 2004 at 18:45 UTC
If it never has more than 10 files, who cares? If it's fast enough, don't worry. if you need more speed, benchmark and profile. Except that sorting N values involves N log N to N^2 comparisons, and if each comparison involves 2 stat, at 10 ms each, it does waste system resources. Schwartzian Transform - creating a hash which associates the stat time with the file name - involves only N stats, and would be economical. If files are not going to change, keep a list of known files in a hash, and obtain a list of eligible files. If there are any new files, stat only the ones you don't already know about. -- `TTTATCGGTCGTTATATAGATGTTTGCA`	[reply]
Re: Finding oldest file in directory by ikegami (Patriarch) on Oct 18, 2004 at 21:08 UTC
You mentioned portability was a requirement, so you should use File::Spec to build paths. "/" is not the file seperator on Macs, for example. I have a script that needs to determine the oldest file in a particular directory If all you're concerned about is which file is the oldest, there's no need to sort: `sub get_oldest { my ($dir) = @_; my $oldest; my $oldest_time; my $file_spec = File::Spec->catfile($dir, '*.pl'); foreach (glob $file_spec") { my $time = (stat $_)[10]; if (!$oldest_time \|\| $time < $oldest_time) { $oldest = $_; $oldest_time = $time; } } return $oldest; }` [download] I don't know how efficient glob is. You can get rid of it: `use DirHandle (); use File::Spec (); sub get_oldest { my ($dir) = @_; my $oldest; my $oldest_time; my $dh = DirHandle->new($dir); while (defined($_ = $dh->read())) { next unless (/\.pl$/i); my $full_path = File::Spec->catfile($dir, $_); my $time = (stat $full_path)[10]; if (!$oldest_time \|\| $time < $oldest_time) { $oldest = $_; $oldest_time = $time; } } return $oldest; }` [download]	[reply] [d/l] [select]
Re^2: Finding oldest file in directory by mattr (Curate) on Oct 20, 2004 at 15:05 UTC
Hi, I'm lazy but since I wrote a program called filexer to sync uploads I'll just make a couple of nitpicking suggestions. If you are dealing with remote mounting of windows shares do a lot of testing. In particular permissions and you-can't-get-there-from-here took a lot of my time. I used a cygwin binary at one point to solve a problem windows wasn't helping me with. It was a while ago and I don't have the code online right now, but I'm pretty sure I used cygwin's touch command. Granularity < 1 second might flub it. Illegal characters esp. colons, questionmarks, non-western encodings, filename lengths, etc. if you are actually copying across net like I did. Likewise if so, then security issues possibly. maybe loading interpreter and scanning program are going to take a while too. Consider running under mod_perl (even if just under cgi emulation) and calling once evvery 10 seconds via a crontab? I don't want to think of any perl program being launched on my system from scratch every 10 seconds.. that is, can you just keep the thing running all the time instead of quitting it after 10 seconds. Much better then I think. Have fun!	[reply]
Re: Finding oldest file in directory by pg (Canon) on Oct 18, 2004 at 18:54 UTC
glob is actually implemented base on File::Glob since 5.6.0, so turn on GLOB_NOSORT might help a bit,as you want your own sort, not its sort any way. `use File::Glob ':glob'; @list = bsd_glob('.', GLOB_NOSORT); print join(',',@list);` [download]	[reply] [d/l]
Re: Finding oldest file in directory by bluto (Curate) on Oct 18, 2004 at 19:25 UTC
I wouldn't use the 'ctime' field (nor '-C' for that matter) since it's implementation varies depending on platform, and doesn't necessarily indicate age of a file's data -- just the file's metadata. You'll probably want to to use mtime (write time) or atime (access: most recent read or write) instead. See "perldoc perlport" and look for ctime. I also wouldn't consider optimizing this much since any reasonable OS will cache your stats for you (and you only have a few files). These stats may even stay cached if you are reading them once a minute.	[reply]
Re: Finding oldest file in directory by jdporter (Paladin) on Oct 18, 2004 at 18:48 UTC
`my $cmd = $^O =~ /Win32/ ? 'dir /od /b' : 'ls -1rt'; my( $file ) = qx( $cmd ); chomp $file;` [download]	[reply] [d/l]
Re: Finding oldest file in directory by Eyck (Priest) on Oct 19, 2004 at 12:36 UTC
Why sort at all? The problem is to find the oldest file, that's it. just walk the list of files, and compare every one to the 'currently newest', this would be the most efficient solution. I'm shocked that people actually suggested Shwartzian Transform and similiarly overgrown solutions to such simple problem.	[reply]
•Re^2: Finding oldest file in directory by merlyn (Sage) on Oct 19, 2004 at 13:04 UTC
Careful. The high-water-mark algorithm is actually slower than sorting to get the highest value, for some small number of items. Think of the few lines of Perl code that would have to be repeatedly executed for each item. Then think of how little work it takes directly in C to sort that list instead. Yes, surprising when I first heard it too. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re^3: Finding oldest file in directory by theorbtwo (Prior) on Oct 19, 2004 at 13:16 UTC
Note that the arguments against a perl-based high-water-mark algorithm don't apply to List::Util::max, which, like sort, is written in C. (The overhead of a function call vs the overhead of other opcodes does, however, apply, but that's a very small difference.)	[reply]
•Re^4: Finding oldest file in directory by merlyn (Sage) on Oct 19, 2004 at 13:22 UTC
Re: Finding oldest file in directory by elwarren (Priest) on Oct 19, 2004 at 18:15 UTC
How about adding a test to see if the directory has changed before examining every file in the dir? If the dir hasn't updated, skip the test.	[reply]
Re^2: Finding oldest file in directory by Anonymous Monk on Oct 21, 2004 at 10:22 UTC
Along the same lines, test what was the oldest file last time and if it hasn't changed, it is still the oldest file.	[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks