Re: Non-greedy regex behaves greedily

I would like to reopen this with a similar question.

Target string:

Back to STATES Menu</font></a></h3> <p align="center"><a href="index.htm"><img src="home2.gif" alt="Home" border="0" width="106" height="30"></a></p> </body> </html>

regex:

</a>.*?$

</a>.*?\$

matches:

</a></h3> <p align="center"><a href="index.htm"><img src="home2.gif" alt="Home" border="0" width="106" height="30"></a></p> </body> </html>

I expect it to match:

</a></p> </body> </html>

Both PERL and Regex Coach seem to concur, so I must be missing something.

Comment on Re: Non-greedy regex behaves greedily Select or Download Code

Replies are listed 'Best First'.
Re^2: Non-greedy regex behaves greedily (leftmost) by tye (Sage) on Jul 28, 2008 at 06:09 UTC
The regex engine matches "leftmost longest" and not "longest leftmost". The non-greedy modifier changes "longest" to "shortest" but doesn't change "leftmost" nor make "leftmost" no longer trump "longest"/"shortest". ("leftmost" refers to how close to the start of the string is the beginning of the matched substring.) The long match is indeed the leftmost possible match. The ? would change the quantifier so that you got the shortest of the leftmost possible matches instead of the longest of the leftmost possible matches. You can read about sexeger (or search for more threads: sexeger sexeger) to see how sometimes it can be useful or at least fun to reverse your string and your regex so that you get the substring with the "rightmost" ending point and can choose between longest/shortest as the regex engine moves leftward (with respect to the original string). For your particular case, I'd just use rindex and then substr. - tye	[reply]
Re^2: Non-greedy regex behaves greedily by linuxer (Curate) on Jul 27, 2008 at 17:43 UTC
Your regex doesn't force a non-greedy behaviour. I'll try to explain with a simplified text example: `my $text = <<TEXT; 000ABCDEFABCGHI TEXT if ( $text =~ m{(ABC.*?)$} ) { print $1, $/; }` [download] The engine reads $text from left to right and will have a try with starting at the first "ABC", using the complete following string until end of line. As that's exactly what the regex requested, this result is returned. There's no condition which forces the engine to search for a shorter result. There will be no second run which checks, if the current result may contain a shorter result. The first valid match will be returned; this isn't always the best match.	[reply] [d/l]
Re^3: Non-greedy regex behaves greedily by kovacsbv (Novice) on Jul 27, 2008 at 23:36 UTC
Ok, is there a nice detailed description of the engine that would fill in what causes a second run and what the "?" does exactly? This behavior isn't very intuitive. Also, is there another way to get the desired result other than the ugly hack I posted below?	[reply]
Re^4: Non-greedy regex behaves greedily by ysth (Canon) on Jul 28, 2008 at 10:03 UTC
You can force the regex engine to start looking for `</a>` at the end of the string and work forwards by consuming all the string to start with and backtracking character by character: `my $string = q!Back to STATES Menu</font></a></h3> <p align="center">< +a href="index.htm"><img src="home2.gif" alt="Home" border="0" width=" +106" height="30"></a></p> </body> </html>!; if ( $string =~ m!^.(</a>.?)$! ) { print "got $1\n"; }` [download] but there's usually a better way to get what you want done. -- Online Fortune Cookie Search Office Space merchandise	[reply] [d/l] [select]


XP is just a number
	PerlMonks