Re: Isn't /m for multiline regex?

by afoken (Canon)
on Apr 19, 2010 at 11:39 UTC

in reply to Isn't /m for multiline regex?

what am I doing wrong here?

You try to parse HTML using regular expressions. That simply can't work, due to the way HTML is defined. Use a HTML parser, a CPAN search will list several.


Re^2: Isn't /m for multiline regex?
on Apr 19, 2010
    The parent node overreaches.

    While it's true that it's not generally a good idea to try to parse html with regexen, "(t)hat simply can't work is not.

    It can be done... and often is for simple cases... but is fraught with so many difficulties that it's inadvisable. What's more, trying to parse html of any complexity with tools other than the well-tested modules referenced above flies in the face of the mantra 'don't re-invent the wheel.'

      from perlfaq6

      Here's code that finds everything between START and END in a paragraph:
      undef $/; # read in whole file, not just one line or paragraph while ( <> ) { while ( /START(.*?)END/sgm ) { print "$1\n"; } }

Node Type: note
As of 2022-01-24
