Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Reading directories and parsing standard names

by Amoe (Friar)
on Aug 11, 2001 at 12:38 UTC ( [id://104127]=perlquestion: print w/replies, xml ) Need Help??

Amoe has asked for the wisdom of the Perl Monks concerning the following question:

Sorry if that title isn't clear. Say I have a directory, and all the files are named like this:
foo_1.bar foo_2.bar foo_3.bar # and so on...
I have a perl sub that I want to open the directory, read through all the names, select the last one(i.e. the one with the highest number), parse the number off the end and autoincrement it, in order to get a new filename like the ones already there. I've tried reading through all files, sorting them and doing $#files++, but that seems to be rather longwinded and also only worked some of the time (I don't know why, clearly my bad). So can anyone suggest a better way?

Replies are listed 'Best First'.
Re: Reading directories and parsing standard names
by abstracts (Hermit) on Aug 11, 2001 at 14:07 UTC
    Hello

    Solving your problems involves 4 steps:

    1. Get the names of files in the directory matching foo_*.bar.
    2. Map that list to another list of numbers foo_(\d+).bar.
    3. Sort that list numerically in descending order (using spaceship <=>).
    4. Get the first element + 1.
    print get_num('.'); sub get_num{ my $dir = shift; (sort{$b<=>$a} map{/(\d+)/, $1}<$dir/foo_*.bar>)[0] + 1; }
    Hope this helps,,,

    Aziz,,,

    Update: The space complexity of this algorithm is O(n) and time complexity is O(nlogn). It might be OK for small number of files but there are better ways for larger number of files. Zaxo's algorithm works the same way but performs worse because it does more regexp matches than the algorithm presented. This algorithm does n matches while the other does an order of nlogn matches.

    The better answer is as follows:

    sub get_num{ my $dir = shift; my $c = 0; /(\d+)/ and $1>$c and $c=$1 while <$dir/foo_*.bar>; return $c+1; }
    As it has O(1) space complexity and O(n) time complexity. Needless to say that this is a somplete solution that requires no special cases. It's also much shorter than my previous example.

    Enjoy.

    Aziz,,,

Re: Reading directories and parsing standard names
by Zaxo (Archbishop) on Aug 11, 2001 at 14:18 UTC

    I suspect the problem is in the manner of sorting. If you did a string sort on the array of names, foo_10.bar would come before foo_2.bar. A numeric sort on captured digits should work.

    Here is a stab at it:

    # needs several foo_<n>.bar { my $re = qr/foo_(\d+)\.bar$/; sub seq_num { $_ = shift; m/$re/; return $1; } } my @files = </dir/to/use/foo_*.bar>; my @sorted_files = sort { seq_num($b) <=> seq_num($a) } @files; my $next = 1 + seq_num($sorted_files[0]); open NEXT, "> /dir/to/use/foo_$next.bar"; # print content to NEXT close(NEXT);

    A full solution would special-case @files for 0 or 1 elements.

    After Compline,
    Zaxo

      This works great. Thanks loads :) *credits zaxo*
Re: Reading directories and parsing standard names
by George_Sherston (Vicar) on Aug 11, 2001 at 15:13 UTC
    In the spirit of timtowtdi, why not add to the end of your sub a tiny routine that saves the name of the highest numbered file in a text file in the same directory? And a tiny routine at the beginning that opens this text file and reads the contents? I mean, why search for something you hid yourself? Sorry if there's something I didn't pick up on that makes this an impractical suggestion.

    § George Sherston
      I thought about doing that, but it seemed kinda messy :P

        Messy? With this you're looking at one step versus four or five.

        Whether you use a text file to hold the id or not, I smell a possible race condition. See File::CounterFile for more info.

Re: Reading directories and parsing standard names
by runrig (Abbot) on Aug 11, 2001 at 18:41 UTC
    No need to sort if you just want the highest number:
    my ($max_file, $max_num); while (defined(my $file = <foo_*.bar>)) { ($max_file, $max_num) = ($1, $2) if $file =~ /^(foo_(\d+)\.bar)$/ and (!defined $max_num or $2 > $max_num); } print "$max_file\n";
Re: Reading directories and parsing standard names
by kjherron (Pilgrim) on Aug 11, 2001 at 21:56 UTC
    In the spirit of thinking outside the box. do the file numbers have to be sequential or start at 1? When I've needed to produce unique files within a directory, it's often been sufficient to use the current time as part of the filename, viz:
    $name = 'foo_' . time() . '.bar'; or my($sec, $min, $hr, $day, $mon, $year) = (gmtime)[0..5]; $name = sprintf("foo_%04d%02d%02d.%02d%02d%02d.bar", $year + 1900, $mon + 1, $day, $hr, $min, $sec);
    As long as you don't try to create more than one file per second, this will produce a new filename every time.
      Hello

      To generate unique names for files, you can even use the File::Temp module. The function tempfile, given a template, returns the filehandle and filename of the new file that was just opened. This way, you don't need to worry about the one-sec-time or other problems.

      use File::Temp qw/tempfile/; ($fh, $filename) = tempfile( $template, DIR => $dir, SUFFIX => '.dat', + CLEANUP => 0);
      Hope this helps,,,

      Aziz,,,

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://104127]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-25 21:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found