Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: More efficient use of HTML::TokeParser::Simple

by henka (Novice)
on Jul 11, 2006 at 06:17 UTC ( [id://560331]=note: print w/replies, xml ) Need Help??


in reply to Re: More efficient use of HTML::TokeParser::Simple
in thread More efficient use of HTML::TokeParser::Simple

I poked around HTML::TreeBuilder, but my goodness, things are complicated. It may not seem like it to seasoned monks, but to a C programmer, the OO aspects and data structures of perl are, well, daunting. Gleaning how to do something as simple as the one I posted here from the perl module docs is almost always an excercise in frustration.
  • Comment on Re^2: More efficient use of HTML::TokeParser::Simple

Replies are listed 'Best First'.
Re^3: More efficient use of HTML::TokeParser::Simple
by GrandFather (Saint) on Jul 11, 2006 at 08:47 UTC

    Here's a trivial example that seems to do something like what you want and may be enough to get you started with TreeBuilder:

    use warnings; use strict; use HTML::TreeBuilder; my $html = do {local $/; <DATA>}; my $tree = HTML::TreeBuilder->new (); $tree->parse ($html); $tree->eof (); $tree->elementify(); my ($title) = $tree->find ('title'); my @h1 = $tree->find ('h1'); print $title->as_text (), "\n"; print $_->as_text (), "\n" for @h1; __DATA__ <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <!-- Took this out for IE6ites "http://www.w3.org/TR/REC-html40/loose. +dtd" --> <html lang="en"> <head> <title>More efficient use of HTML::TokeParser::Simple perlquestion + id:560199</title> </head> <body> <h1>Header 1</h1> <p>First paragraph</p> <h1>Header 2</h1> <p>Second paragraph</p> <h2>Level 2 header 1</h2> </body> </html>

    Prints:

    More efficient use of HTML::TokeParser::Simple perlquestion id:560199 Header 1 Header 2

    DWIM is Perl's answer to Gödel
      What does
      $tree->elementify();
      do here? It appears to run ok if it is commented out. I've often seen it in snippets and have no idea what purpose it serves.

        The HTML::TreeBuilder documentation is a good place to start. It says that elementify ():

        This changes the class of the object in $root from HTML::TreeBuilder to the class used for all the rest of the elements in that tree (generally HTML::Element). Returns $root.

        and goes on to say:

        For most purposes, this is unnecessary, but if you call this after (after!!) you've finished building a tree, then it keeps you from accidentally trying to call anything but HTML::Element methods on it. ...

        Perl reduces RSI - it saves typing

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://560331]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-19 08:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found