http://qs321.pair.com?node_id=11116037


in reply to Regular Expression Help

I would absolutely use a DOM parser for this such as Mojo::DOM, as has already been recommended. In fact, it's exactly the one I would use, though there are many alternatives also on CPAN. If I were looking at this with only regular expressions in my tool belt I would alter your regex as follows:

if ($string =~ m/<\s*span\s+class\s*=\s*"Trsdu\(0\.3s\)\s+Fw\(b\)\s+Fz +\(36px\)...............

In other words, because HTML allows whitespace just about everywhere, you have to allow for whitespace to show up just about anywhere in your patten. But you can't use this either, because the order of elements in a span tag is not set in stone. 'class' and 'data-reactid' can come in any order, so you would also need to deal with that. By the time you've dealt with these realities, you've gotten a pretty good start at writing a really fragile and specialized tool that would be better served by a DOM parser.


Dave