Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Stripping HTML tags from a document

by cjf (Parson)
on Jun 30, 2002 at 15:55 UTC ( #178380=note: print w/replies, xml ) Need Help??


in reply to Strip HTML tags again

Have a look at HTML::Tagset it contains various lists of valid HTML tags for different sections of a document.

Update: ++ to Ovid for providing the working example below.

  • Comment on Re: Stripping HTML tags from a document

Replies are listed 'Best First'.
Re: Re: Stripping HTML tags from a document
by dda (Friar) on Jun 30, 2002 at 16:27 UTC
    Thanks!!! It is the stuff I was looking for. Now I'd like to know how to use it in a 'perl' manner. Currently I have the following code (right from perlfaq):
    sub strip_html { my $t = shift; $t =~ s/<(?:[^>'"]*|(['"]).*?\1)*>//gs; return $t; }
    Seems like I have to use %HTML::Tagset::isKnown hash, but how to apply it to my sub? I can't find any quick way...

    --dda

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://178380]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2023-04-01 16:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?