Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: HTML::Parser example wanted...

by andreychek (Parson)
on Jun 26, 2001 at 19:19 UTC ( [id://91622]=note: print w/replies, xml ) Need Help??


in reply to HTML::Parser example wanted...

Actually, there are a bunch of examples that come with the HTML::Parser module, found in the "eg" directory. Taking the code from there, here is an example of how to parse all the text from an HTML document:
#!/usr/bin/perl -w # Extract all plain text from an HTML file use strict; use HTML::Parser 3.00 (); my %inside; sub tag { my($tag, $num) = @_; $inside{$tag} += $num; print " "; # not for all tags } sub text { return if $inside{script} || $inside{style}; print $_[0]; } HTML::Parser->new(api_version => 3, handlers => [start => [\&tag, "tagname, '+1'"], end => [\&tag, "tagname, '-1'"], text => [\&text, "dtext"], ], marked_sections => 1, )->parse_file(shift) || die "Can't open file: $!\n";;
That code is located in eg/htext. After taking a look, you can see that it is event driven. The HTML::Parser->new line has an option in it called "handlers", which tells HTML::Parser which function to call upon seeing a certain tag type. In this case, every start tag calls the function "tag" with the parameters "tagname", which is the actual tagname, and +1, which identifies it as a start tag.

Personally, I have had more luck with HTML::TokeParser, but that isn't the case for everyone I'm sure. I find that HTML::TokeParser is a bit more intuitive for this sort of job, but that is perhaps just the way I think.. or maybe I just wasn't using it right ;-) In any case, good luck.
-Eric

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://91622]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (2)
As of 2024-04-26 00:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found