Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

HTML::Strip question--stripping only certain tags?

by Anonymous Monk
on Feb 01, 2006 at 21:49 UTC ( [id://527181]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to take some HTML text and strip it of only certain HTML tags (the rest can stay). I took a look at HTML::Strip and it seems I can either strip all HTML tags or only a list of certain HTML tags. Is there some way to get it to strip everything EXCEPT a certain list of tags? Is there some other module I should be looking at instead?
  • Comment on HTML::Strip question--stripping only certain tags?

Replies are listed 'Best First'.
Re: HTML::Strip question--stripping only certain tags?
by bmann (Priest) on Feb 01, 2006 at 23:05 UTC
    HTML::Scrubber lets you allow only selected tags. I lifted the following from the pod (slightly modified):

    #!/usr/bin/perl -w use HTML::Scrubber; use strict; my $html = q[ <style type="text/css"> BAD { background: #666; color: #666;} </style> <script language="javascript"> alert("Hello, I am EVIL!"); </script> <HR> a => <a href=1>link </a> br => <br> b => <B> bold </B> u => <U> UNDERLINE </U> ]; # only allow the following tags my $scrubber = HTML::Scrubber->new( allow => [ qw[ p b i u hr br ] ] ) +; print $scrubber->scrub($html); __END__ Output: <hr> a => link br => <br> b => <b> bold </b> u => <u> UNDERLINE </u>
    style, script and links are gone.
Re: HTML::Strip question--stripping only certain tags?
by Fletch (Bishop) on Feb 01, 2006 at 22:01 UTC

    If you only want to remove certain parts you could take a to look at HTML::TreeBuilder and friends and use it to selectively pull out the elements you want from what's there. Alternately if your HTML is well formed enough you could use XML::Twig to do something similar.

    Another source of inspiration might be to get the slashcode source and look at its comment filtering (seeing as that's what this sounds like you're trying to do).

Re: HTML::Strip question--stripping only certain tags?
by Jenda (Abbot) on Feb 02, 2006 at 00:24 UTC

    There's a module based on HTML::Parser for this on my pages (not released to CPAN) that allows you just that and more. It allows you to specify not just the list of tags to allow, but also the attributes. So not more unexpected onMouseOvers and onLoads ;-)

    Just the module name is a bit silly ...

    Jenda
    XML sucks. Badly. SOAP on the other hand is the most powerfull vacuum pump ever invented.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://527181]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-04-23 22:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found