Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Try HTML::TokeParser::Simple

by xtype (Deacon)
on Feb 23, 2003 at 01:51 UTC ( [id://237832]=note: print w/replies, xml ) Need Help??


in reply to Parsing with HTML::Parser

I am a little surprised that no one has suggested this before me.
use LWP::Simple qw($ua get head); use HTML::TokeParser::Simple; my $webpage = "http://some-url.com"; $ua->timeout(30); my ($html, $parsed_html); if (head($webpage)) { $html = get $webpage || return 0; } else { return 0; } my $p = HTML::TokeParser::Simple->new( \$html ); while ( my $token = $p->get_token ) { next unless $token->is_text; $parsed_html .= $token->as_is; }
update: Woops, guess I did not read the first post completely. I posted nearly the same code.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://237832]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2024-04-19 22:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found