Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: help with lazy matching

by Special_K (Monk)
on Jan 05, 2015 at 22:17 UTC ( [id://1112234]=note: print w/replies, xml ) Need Help??


in reply to Re: help with lazy matching
in thread help with lazy matching

I guess my thinking was that with a non-greedy modifier, my regular expression could use the slash before "bat" to match the slash, then it would match "bat" as the .+, and then finally it would match the end of line character in the file as the $.

Why does it not work that way?

Replies are listed 'Best First'.
Re^3: help with lazy matching
by nlwhittle (Beadle) on Jan 05, 2015 at 22:27 UTC

    The non-greedy modifier simply means "match as little as possible while still getting a successful match". All regex matches in Perl Compatible Regular Expressions always match leftmost first; in your case the first slash. Where the non-greedy operator might have worked, for example, is if you wanted to only match 'foo'. Then you could write:

    if ( /\/(.+?)\// )

    This will match the first slash, then non-greedily match any other characters until another slash is reached. If you didn't use the non-greedy modifier here, you would match everything between the first and last slash (i.e. 'foo/bar/baz').

    --Nick
      I think the source of my confusion was not knowing that regular expressions in perl always start matching from the left side. If the regular expression could start matching from anywhere, then using the non-greedy modifier could give the behavior I was expecting in my original post, i.e. matching "bar".

        This is not a Perl-specific issue. The "Leftmost" rule is one of the features of a NFA-based regular expression engine, which includes Perl, PHP, Python, and most other commonly used regular expression implementations. So now that you're aware of it with respect to Perl, you've learned something that can be applied to most other languages that implement regexes as well! :)


        Dave

Re^3: help with lazy matching
by Anonymous Monk on Jan 05, 2015 at 22:58 UTC

    I like the description in the Camel:

    ... regular expressions will try to match as early as possible. This even takes precedence over being greedy. Since scanning happens left to right, the pattern will match as far left as possible, even if there is some other place where it could match longer. (Regular expressions may be greedy, but they aren’t into delayed gratification.) ...

    (copied from the free sample material on the O'Reilly website, http://cdn.oreillystatic.com/oreilly/booksamplers/9780596004927_sampler.pdf, book page 44)

    Another key thing to realize is that the $ does not change the behavior to scanning from right-to-left.

Re^3: help with lazy matching ( .+? versus [^/]+? rxrx -Mre=debug )
by Anonymous Monk on Jan 05, 2015 at 22:37 UTC

    Why does it not work that way?

    the regex metacharacter dot (.) means match any character ( except newline or including newline)

    it starts to match after the first / is matched and it matches all subsequent /

    This is a FAQ but hard to search for FAQ :)

    use re 'debug'; and watch it work

    use rxrx and watch it work

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1112234]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (8)
As of 2024-04-18 16:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found