Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: If I'm matching a pattern wy does a + sign make things crazy?

by AnomalousMonk (Archbishop)
on May 05, 2020 at 20:52 UTC ( [id://11116490]=note: print w/replies, xml ) Need Help??


in reply to If I'm matching a pattern wy does a + sign make things crazy?

Corion and haukex have already referred to the likelihood that in your situation, adding a literal '+' to the match constrains the otherwise "greedy" .* match to stop with the first occurrence of the regex pattern. They have also recommended much more fundamentally robust approaches to solving your problem.

I'd love to know why.

WRT regex mechanics, I hope I can provide a detailed answer to your prayer. As already mentioned, this behavior can be demonstrated using any character (or, indeed, substring) as an explicit "anchor" for the match:

c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'xxx xyzzyfooAbar yyy xyzzyzotBbar zzz'; ;; my $match; ;; print qq{A: .*: '$match'} if ($match) = $s =~ m{ (xyzzy .* bar) +}xms; print qq{B: .* A: '$match'} if ($match) = $s =~ m{ (xyzzy .* A bar) +}xms; print qq{C: .*?: '$match'} if ($match) = $s =~ m{ (xyzzy .*? bar) +}xms; " A: .*: 'xyzzyfooAbar yyy xyzzyzotBbar' B: .* A: 'xyzzyfooAbar' C: .*?: 'xyzzyfooAbar'

In example A, the greedy .* match grabs as much as it can (to the end of the string in this case), but then the regex engine backtracks until the first point at which it can match an explicit 'bar' substring. Unfortunately, this gives you a bit more than you want even in the absence of the  /g modifier: the regex engine strives for the leftmost, longest match.

In example B, .* still grabs as much as it can (to the end of the string), but then the regex engine backtracks until it can match an explicit 'A' substring. Then matching moves forward again to find the 'bar' substring.

In example C, the "lazy" modifier ? of the .*? match means that it will match as little as possible to achieve an overall match with 'bar'. No backtracking occurs.

Update: Corrected a couple of trivial spelling/formatting errors.


Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11116490]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2024-04-24 23:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found