more useful options | |
PerlMonks |
Re: I match a pattern in regex, yet I don't get the group I wanted to extract for some reasonby jcb (Parson) |
on Jan 12, 2021 at 05:00 UTC ( [id://11126779]=note: print w/replies, xml ) | Need Help?? |
This would probably be a good application for HTML::Parser or any of the DOM-building modules that I am sure other monks will hasten to recommend. However, your problem is probably that the "stretchy" groups in your pattern are not matching as you intend. I suggest (untested) m!<div class="soda[^"]*">(.*?)</div>! instead. The important difference is that this alternative constrains the initial "discard" match to not include double quotes, and therefore not to run past the opening div tag. Also note the use of ! as delimiter to avoid "leaning toothpick syndrome" in this version. If you are trying to catch multiple items from a single large input block, I suggest (also untested):
If the text you want does not contain additional HTML, you could also replace (.*?) with ([^<]*). Generally, more constrained search patterns like these will also perform better because they will need backtracking less often. If the text you want can contain additional HTML, use HTML::Parser; it will work far better.
In Section
Seekers of Perl Wisdom
|
|