Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

html/mason parser

by sharkey (Scribe)
on Mar 16, 2005 at 00:01 UTC ( [id://439819]=perlquestion: print w/replies, xml ) Need Help??

sharkey has asked for the wisdom of the Perl Monks concerning the following question:

I need to parse some HTML pages, but these pages also have HTML::Mason markup in them. The <% %> tags in particular are challenging since they may have perl code containing > and < characters which may confuse strict HTML parsers.

Is is possible to make HTML::Parser (or some other module) correctly handle the extra markup? Any advice on how to go about it would also be appreciated.

Replies are listed 'Best First'.
Re: html/mason parser
by Fletch (Bishop) on Mar 16, 2005 at 03:02 UTC

    You might could twiddle a custom HTML::Mason::Compiler subclass (and/or HTML::Mason::Lexer) and spit out just the non-code chunks. I believe that the Mason book has an example you could crib from in chapter 12.

Re: html/mason parser
by Joost (Canon) on Mar 16, 2005 at 00:22 UTC
Re: html/mason parser
by trammell (Priest) on Mar 16, 2005 at 03:02 UTC
    Don't forget the <& ... &> tags...
Re: html/mason parser
by sharkey (Scribe) on Mar 16, 2005 at 17:32 UTC
    I should have been more specific, that I actually want to filter the HTML pages, to add and manipulate tags.

    So just stripping the Mason out is a bit of an inconvenience, because I will have to find a way to put it back as well.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://439819]
Approved by moot
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-03-28 16:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found