Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I would absolutely use a DOM parser for this such as Mojo::DOM, as has already been recommended. In fact, it's exactly the one I would use, though there are many alternatives also on CPAN. If I were looking at this with only regular expressions in my tool belt I would alter your regex as follows:

if ($string =~ m/<\s*span\s+class\s*=\s*"Trsdu\(0\.3s\)\s+Fw\(b\)\s+Fz +\(36px\)...............

In other words, because HTML allows whitespace just about everywhere, you have to allow for whitespace to show up just about anywhere in your patten. But you can't use this either, because the order of elements in a span tag is not set in stone. 'class' and 'data-reactid' can come in any order, so you would also need to deal with that. By the time you've dealt with these realities, you've gotten a pretty good start at writing a really fragile and specialized tool that would be better served by a DOM parser.


Dave


In reply to Re: Regular Expression Help by davido
in thread Regular Expression Help by vskatusa

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-04-19 18:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found