Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re: Regular Expression Help

by davido (Cardinal)
on Apr 24, 2020 at 18:57 UTC ( #11116037=note: print w/replies, xml ) Need Help??

in reply to Regular Expression Help

I would absolutely use a DOM parser for this such as Mojo::DOM, as has already been recommended. In fact, it's exactly the one I would use, though there are many alternatives also on CPAN. If I were looking at this with only regular expressions in my tool belt I would alter your regex as follows:

if ($string =~ m/<\s*span\s+class\s*=\s*"Trsdu\(0\.3s\)\s+Fw\(b\)\s+Fz +\(36px\)...............

In other words, because HTML allows whitespace just about everywhere, you have to allow for whitespace to show up just about anywhere in your patten. But you can't use this either, because the order of elements in a span tag is not set in stone. 'class' and 'data-reactid' can come in any order, so you would also need to deal with that. By the time you've dealt with these realities, you've gotten a pretty good start at writing a really fragile and specialized tool that would be better served by a DOM parser.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11116037]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2021-04-16 02:47 GMT
Find Nodes?
    Voting Booth?

    No recent polls found