Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Pattern Matching in Arrays

by aj.kohler (Initiate)
on May 11, 2009 at 16:06 UTC ( [id://763281]=perlquestion: print w/replies, xml ) Need Help??

aj.kohler has asked for the wisdom of the Perl Monks concerning the following question:

I am reading in a directory of filenames:

12345_-v1.pdf
12345_Av1.pdf
123456_-v1.pdf
123456_Av1.pdf
123456_Bv1.pdf
etc...

I need to pattern match the root filename(12345), and display the latest version of file(-v1,Av1,B1):


12345_Av1.pdf
123456_Bv1.pdf

How can I accomplish this? I am not too experienced in RE's at this time.
Thanks
AJ

updated format

Replies are listed 'Best First'.
Re: Pattern Matching in Arrays
by przemo (Scribe) on May 11, 2009 at 16:23 UTC

    Please, format you messages according to Markup in the Monastery to made them readable!

    I need to pattern match the root filename, and display the latest version of file:
    12345_Av1.pdf 123456_Bv1.pdf

    Your examples don't define well what is `root filename' and `version'. Assuming `the root' is anything before _ and `version' is everything after _.v, the following expressions will match them in $1 and $2, respectively:

    if ($filename =~ /^([^_]+)_.*v(\d+)/) { # do something with $1 and $2 here }

    You can find more about regular expressions in e.g. perlretut.

    Update: typos corrected

Re: Pattern Matching in Arrays
by ciderpunx (Vicar) on May 11, 2009 at 16:30 UTC
    One way, which would be slow for directories with lots of files would be something like this. Sort the array of filenames, then use the pars before the _ as hash keys, the elements just overwrite the previous values.
    #!/usr/bin/perl use strict; use warnings; my @data = qw/12345_-v1.pdf 12345_Av1.pdf 123456_-v1.pdf 123456_Av1.pd +f 123456_Bv1.pdf/; my %hash; for my $f (sort @data) { my ($key,$version) = split /_/,$f; $hash{$key} = $version; } for my $key (keys %hash) { print $key . '_' . $hash{$key} . "\n"; }
    --
    Linux, perl, punk rock, cider: charlieharvey.org.uk.

      Thank you for you help

      I came across an issue after reviewing the data

      The core number is the 12345 portion, as assumed. However, there are some filenames that have multiple underscore characters in the name, before it gets to the version value.

      Examples:

      g05495_1_-v1.pdf
      21586_rework_bore_tool_-v2.pdf
      4148-t-14119_-v1.pdf
      zprt0019548_wiper_die-nc_-1.pdf
      zprt0016809_fg_tooling_A2.pdf

      The -v1, -v2, -v1, -1, A2 are the version keys, and everything before that is the core document number. Any thoughts on how to address these?

        In such case, you have to know where the end is to determine, where the last `_' is...

        I'll assume that the filename ends with .pdf:

        if ($filename =~ /^(.*)_(.*)\.pdf$/) { # do something with $1 and $2 here }

        Because of the Perl "longest-first" matching, the underscore will match the last underscore in the filename.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://763281]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-03-28 16:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found