comment on

I would absolutely use a DOM parser for this such as Mojo::DOM, as has already been recommended. In fact, it's exactly the one I would use, though there are many alternatives also on CPAN. If I were looking at this with only regular expressions in my tool belt I would alter your regex as follows:

if ($string =~ m/<\s*span\s+class\s*=\s*"Trsdu\(0\.3s\)\s+Fw\(b\)\s+Fz
+\(36px\)...............
[download]

In other words, because HTML allows whitespace just about everywhere, you have to allow for whitespace to show up just about anywhere in your patten. But you can't use this either, because the order of elements in a span tag is not set in stone. 'class' and 'data-reactid' can come in any order, so you would also need to deal with that. By the time you've dealt with these realities, you've gotten a pretty good start at writing a really fragile and specialized tool that would be better served by a DOM parser.

Dave

In reply to Re: Regular Expression Help by davido
in thread Regular Expression Help by vskatusa

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


laziness, impatience, and hubris
	PerlMonks