Regex a little less greedy please

martymart has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks, I have a little app that uses a regular expression, the regex uses:

<APPEND.*>
[download]

The information I need from this is the result of the postmatch of this search. Trouble is the greedy quantifier. The expression is searching on a string like:

<APPEND changed_date="02-02-2003">This is sample text</APPEND>
[download]

I would like the postmatch to give me back::

This is sample text</APPEND>
[download]

Instead, I think its matching to the '>' at the end of the string. What I need is to be able to tell the regex that the first time it encounters a '>' that it has achieved its match, is this possible? I would appreciate any ideas you may have on this.
Martymart

Comment on Regex a little less greedy please Select or Download Code

Replies are listed 'Best First'.
Re: Regex a little less greedy please by broquaint (Abbot) on Mar 18, 2003 at 14:19 UTC
De-greedify that dot-star like so `my $str = q[<APPEND changed_date="02-02-2003">This is sample text</APPEND>]; print "match: ", $str =~ m{ <APPEND (.*?) > }x, $/; print "post: ", $', $/; __output__ match: changed_date="02-02-2003" post: This is sample text</APPEND>` [download] Check out `perlre` for more info on perl's regex engine. HTH `_________ broquaint`	[reply] [d/l]
Re: Regex a little less greedy please by arturo (Vicar) on Mar 18, 2003 at 14:29 UTC
From perlre : If you want it to match the minimum number of times possible, follow the quantifier with a "?". Note that the meanings don’t change, just the "greediness" So, changing your regex to `<APPEND.?>` [download] Should get you the behavior you want. If, however, you take to heart the lessons of Death to Dot Star!, you might want to write that this way: `<APPEND[^>]>` [download] Avoiding using the post-match variable and using () to capture the stuff you want to get is left as an exercise for the reader =) HTH If not P, what? Q maybe? "Sidney Morgenbesser"	[reply] [d/l] [select]
Re: Regex a little less greedy please by MZSanford (Curate) on Mar 18, 2003 at 14:16 UTC
Parsing HTML/XML is somewhat tricky to do correctly (what with entities and all ... see Super Search for more info), but if you know that there will not be any >'s in the tag, you may want to use a regexp like ... `m/<[^>]+>/` [download] from the frivolous to the serious	[reply] [d/l]
Re: Regex a little less greedy please by roundboy (Sexton) on Mar 18, 2003 at 19:01 UTC
In addition to using either the non-greedy quantifier (`.?`) or skipping up to the next > (`[^>]`), you also want to capture the text up through the matching end-tag, for which you just need a non-greedy quantifier inside capturing parens. So your regex should look like `m{<APPEND\b[^>]>(.?)</APPEND>}` [download] This puts the text between the tags into `$1`; if you really want the ending tag, too, just move the paren. I added the `/b` to make sure you only match `<APPEND>` tags, and not, e.g., `<APPENDIX>`. The only caveats on this are: You might want to add a `/i` modifier to the match, in case someone adds the tags in lower case. If there's ever a chance of a '>' appearing in the attributes of the tag, you need something more complicated. The following (untested, but based on Friedl's Mastering Regular Expressions) should work: `m{<APPEND\b(?:"[^"]"\|'[^']'\|[^'">])>(.?)</APPEND>}` [download] HTH, `--roundboy`	[reply] [d/l] [select]


go ahead... be a heretic
	PerlMonks