http://qs321.pair.com?node_id=895706


in reply to Formating a HTML document to show certain text.

  1. $ lwp-download http://www.imreportcard.com/products/the-elevation-group
    Saving to 'the-elevation-group.htm'...
    35.2 KB received
     

  2. $ perl htmltreexpather.pl the-elevation-group.htm 2>NUL |grep -A3 "^Product Description$" Product Description
    /html/body/div/div[3]/div/div/div[6]
    //div[@id='leftColTop']/div[6]
    //div[@id='leftColTop']/div[@class='heading']
     

  3. use HTML::TreeBuilder::XPath; my $tree= HTML::TreeBuilder::XPath->new; $tree->parse_file( "the-elevation-group.htm"); for my $n( $tree->findnodes( q#//div[@id='leftColTop']/div[@class='heading']# ) ){ print $n->getValue, "\n"; } __END__ Product Description Detailed Overview Reputation Domain "Whois"
  4. repeat

Replies are listed 'Best First'.
Re^2: Formating a HTML document to show certain text.
by Anonymous Monk on Mar 28, 2011 at 06:52 UTC
    #!/usr/bin/perl -- use strict; use warnings; use HTML::TreeBuilder::XPath; Main( @ARGV ); exit( 0 ); sub Main { my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_file( "the-elevation-group.htm" ); my $XpathXpr = join '|', q#//div[@id='leftColTop']/div[@class='heading']#, q#//div[@id='leftColTop']/div[@class='heading']/following-sibling::nod +e()[1]#, ; for my $node ( $tree->findnodes_as_strings( $XpathXpr ) ){ print "$node\n\n"; } } __END__
    Read http://w3schools.com/xpath/default.asp for gentle introduction to xpath.