Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Pattern matching

by ketema (Scribe)
on Jun 07, 2005 at 16:29 UTC ( [id://464372]=perlquestion: print w/replies, xml ) Need Help??

ketema has asked for the wisdom of the Perl Monks concerning the following question:

I need a quick Regular expresion that will take strings in the format
c:\\documents and settings\\kharris\\local settings\\temporary internet files\\content.ie5\\ak6rkzx8\\bbdhe1.cab
and only match the file name at the end of the path. And on a learning note can someone please explain regex grouping and back matching? its easy enough to match the \\ and I can even get all occurences of the \\, but I can't to figure out how to say "find the last occurrence of \\ and then give me everything from that match to the end of the string" Thanks.

Replies are listed 'Best First'.
Re: Pattern matching
by Fletch (Bishop) on Jun 07, 2005 at 16:33 UTC

    perldoc File::Basename

    --
    We're looking for people in ATL

      Thanks That's perfect and easy.
Re: Pattern matching
by ikegami (Patriarch) on Jun 07, 2005 at 16:39 UTC
    ($file_name) = $path =~ /([^\\]*)$/;

    But paths can often be seperated with "/", even in Windows. Modules usually know details like this, which is why it's usually important to use them. Refer to File::Basename.

    ($file_name) = $path =~ m#([^\\/]*)$#;
Re: Pattern matching
by jZed (Prior) on Jun 07, 2005 at 16:38 UTC
    While I applaud your desire to learn regexes and am sure someone will have some suggestions in that regard, I'd recommend for this task that you instead use the File::Basename module which is part of core perl and is specifically designed to find paths and filenames in a portable fashion.
Re: Pattern matching
by davidrw (Prior) on Jun 07, 2005 at 16:57 UTC
    Definitely go with File::Basename, and i like ikegami's solution above better than this one, but to directly translate your question of "find the last occurrence of \\ and then give me everything from that match to the end of the string" into a regex, i offer this:
    $s =~ /\\(.*?)$/; # UPDATE: THIS IS WRONG!!! my $endPart = $1;
    Four parts: 1) matches a (escaped) backslash 2) anchor to the end of the string with $ 3) capture all the stuff ( .* ) inbetween by surrounding w/parens 4) make that cature non-greedy with the ? -- this way it implicitly makes the \\ be the furthest one to the right in the string.
    Update: OOPS. As pointed out, that greedy usage is wrong. I would just stick with the abover method of the non-backslash character class. Though this will work if matching the beginning of the string like /^(.*?)\\/
    Update 2: I think what i originally intended was /.+\\(.*?)$/

    As for grouping and back matching in general, the Tutorials and perldoc perlre and starting places.
    Quick & dirty grouping overview: Basically the way grouping works is anything you put in parens will get captured, and the first capture will be stored in $1, the second in $2, and so on.
    my $s = "This is a blurb"; if( $s =~ /(\S+)\s\S+\s\S+\s(\S+)/ ){ print "$1:$2"; # This:blurb }
    Two quick notes: A) Always make sure the regex matched successfully before using $1, etc. B) You can prevent parens from capturing by using ?: like this: /(?:blah)/

      $s =~ /\\(.*?)$/;

      ... 4) make that cature non-greedy with the ? -- this way it implicitly makes the \\ be the furthest one to the right in the string.

      This is incorrect: greedy or non-greedy, the regexp engine will always try to match the string at a given location any way it can before it proceeds to try at the next location. This will therefore always match the first "\\", not the last.

      See How will my regular expression match? for more information.

      Hugo

Re: Pattern matching
by tlm (Prior) on Jun 07, 2005 at 16:42 UTC

    This

    my $last_one = ( split, /\\\\/, $string )[ -1 ]
    will give you what you want. Alternatively,
    my $last_one = ( $string =~ /([^\\](?!.*\\).*)$/ )[ 0 ];
    comes to mind, but it looks more complicated that it ought to be (this is largely because it is trying to accommodate the possibility that the desired string contain a single embedded backslash). I prefer the first (split) option over the second.

    Update: Fixed original regexp, which incorrectly included a single backslash at the start of the returned string.

    the lowliest monk

(ignore)
by tlm (Prior) on Jun 07, 2005 at 16:39 UTC

    Posted in error. Please disregard. See 464384 instead.

    the lowliest monk

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://464372]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-04-19 16:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found