Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason

by jcb (Parson)
on Jan 12, 2021 at 05:00 UTC ( [id://11126779]=note: print w/replies, xml ) Need Help??


in reply to I match a pattern in regex, yet I don't get the group I wanted to extract for some reason

This would probably be a good application for HTML::Parser or any of the DOM-building modules that I am sure other monks will hasten to recommend.

However, your problem is probably that the "stretchy" groups in your pattern are not matching as you intend. I suggest (untested) m!<div class="soda[^"]*">(.*?­)</div>! instead. The important difference is that this alternative constrains the initial "discard" match to not include double quotes, and therefore not to run past the opening div tag. Also note the use of ! as delimiter to avoid "leaning toothpick syndrome" in this version.

If you are trying to catch multiple items from a single large input block, I suggest (also untested):

while (m!<div class="soda[^"]*">(.*?­)</div>!g) { say "matched!"; my $grp = $1; say $grp; }

If the text you want does not contain additional HTML, you could also replace (.*?) with ([^<]*). Generally, more constrained search patterns like these will also perform better because they will need backtracking less often.

If the text you want can contain additional HTML, use HTML::Parser; it will work far better.

  • Comment on Re: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason
by Anonymous Monk on Jan 12, 2021 at 17:20 UTC
    Second this. Projects like this always expand to need to consider more things, and an event-driven parser is therefore always the "future-proof" strategy.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11126779]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (7)
As of 2024-04-25 15:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found