Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Strip HTML tags

by swiftone (Curate)
on Dec 16, 2000 at 04:20 UTC ( [id://46988]=note: print w/replies, xml ) Need Help??


in reply to Strip HTML tags

While I generally agree that HTML::Parser is a pain (the great flexibility leads to great complexity), for something like this, HTML::TreeBuilder is just the ticket. Three simple lines
my $tree = HTML::TreeBuilder->new; $tree->parse_file('foo.html'); $non_html = $tree->as_text();
Should do the trick. This quarter's Perl Journal has a good article on it (the included docs need work)

Replies are listed 'Best First'.
Re: Re: Strip HTML tags
by dvergin (Monsignor) on Feb 19, 2004 at 02:01 UTC
    Warning: This code strips out <anything> that is surrounded by <angle> <brackets>. It does not limit its action to true <html tags>.

    ------------------------------------------------------------
    "Perl is a mess and that's good because the
    problem space is also a mess.
    " - Larry Wall

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://46988]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2024-04-23 16:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found