Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: Pattern Matching in Arrays

by aj.kohler (Initiate)
on May 12, 2009 at 13:49 UTC ( [id://763481]=note: print w/replies, xml ) Need Help??


in reply to Re: Pattern Matching in Arrays
in thread Pattern Matching in Arrays

Thank you for you help

I came across an issue after reviewing the data

The core number is the 12345 portion, as assumed. However, there are some filenames that have multiple underscore characters in the name, before it gets to the version value.

Examples:

g05495_1_-v1.pdf
21586_rework_bore_tool_-v2.pdf
4148-t-14119_-v1.pdf
zprt0019548_wiper_die-nc_-1.pdf
zprt0016809_fg_tooling_A2.pdf

The -v1, -v2, -v1, -1, A2 are the version keys, and everything before that is the core document number. Any thoughts on how to address these?

Replies are listed 'Best First'.
Re^3: Pattern Matching in Arrays
by przemo (Scribe) on May 12, 2009 at 17:59 UTC

    In such case, you have to know where the end is to determine, where the last `_' is...

    I'll assume that the filename ends with .pdf:

    if ($filename =~ /^(.*)_(.*)\.pdf$/) { # do something with $1 and $2 here }

    Because of the Perl "longest-first" matching, the underscore will match the last underscore in the filename.

      I liked the split method recommended above.
      Can you use the OR in the split command to pattern match at the time of the split?

      There is a new wrinkle in the mix as well.
      There are filenames without "_": 12345.pdf

      All comments are welcomed!

        I liked the split method recommended above.

        Unfortunately it doesn't work with multiple underscores.

        If you want to stick with regexp, then try this:

        if (/^(.*?)(?:_([^_]*))?\.pdf$/) { # do something with $1 and $2 }
        I liked the split method recommended above.
        ty
        Can you use the OR in the split command to pattern match at the time of the split?
        You want the version number to be the last element of the list that split returns, so you could reverse the list, grab the last element, slurp the rest into an array, reverse that array back the right way round again and join the elements with underscores.
        There are filenames without "_": 12345.pdf
        Yuk, make an @unversioned array and push them into that if you only get one element from split?
        #!/usr/bin/perl use strict; use warnings; my @data = qw/12345.pdf 12345_-v1.pdf 12345_Av1.pdf 123456_-v1.pdf 123456_Av1.pdf 123456_Bv1.pdf g05495_1_-v1.pdf zprt0019548_wiper_die-nc_-1.pdf zprt0019548_wiper_die-nc +_-2.pdf zprt0016809_fg_tooling_A2.pdf zprt0016809_fg_tooling_A +3.pdf/; my %hash; my @unversioned; for my $f (sort @data) { my ($version,@key) = reverse split /_/,$f; my $key = join "_",reverse @key; if($key) { $hash{$key} = $version; } else { push @unversioned,$version; } } for my $key (keys %hash) { print $key . '_' . $hash{$key} . "\n"; } print join "\n",@unversioned,q{};
        --
        Linux, perl, punk rock, cider: charlieharvey.org.uk.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://763481]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2024-04-19 14:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found