HTML::TokeParser - extract values between tags

doubledecker has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am trying to extract the following text from HTML page using the following code, but my code fails..

Budget - $25,000,00
Gross(worldwide) - $58,500,00

#!/usr/bin/perl
use HTML::TokeParser;

my $content = <<HTML;
<h5>Budget</h5>
$25,000,000 (estimated)<br/>
<br/>

<h5>Opening Weekend</h5>
$727,327 (USA) (<a href="/date/09-25/">25 September</a> <a href="/year
+/1994/">1994</a>) (33 Screens)<br/>
<br/>

<h5>Gross</h5>
$28,341,469 (USA) (<a href="/date/08-05/">5 August</a> <a href="/year/
+2012/">2012</a>)<br/>&#163;2,344,349 (UK) (<a href="/date/05-18/">18 
+May</a> <a href="/year/1995/">1995</a>)<br/>&#163;1,732,123 (UK) (<a 
+href="/date/04-16/">16 April</a> <a href="/year/1995/">1995</a>)<br/>
+$58,500,000 (Worldwide)<br/>$555,480 (Belgium)<br/>ESP 637,291,985 (S
+pain)<br/>
<br/>

<h5>Admissions</h5>
82,890 (Belgium)<br/>163,594 (France) (<a href="/date/03-28/">28 March
+</a> <a href="/year/1995/">1995</a>)<br/>410,811 (Germany) (<a href="
+/date/12-31/">31 December</a> <a href="/year/1995/">1995</a>)<br/>1,2
+45,604 (Spain)<br/>
<br/>

<h5>Filming Dates</h5>
<a href="/date/06-16/">16 June</a> <a href="/year/1993/">1993</a>&nbsp
+;-&nbsp;<a href="/date/09-10/">10 September</a> <a href="/year/1993/"
+>1993</a><br/>
<br/>
HTML

my $description = "";
my $parser = HTML::TokeParser->new(\$content) || die "Can't open: $!";

while (my $token = $tp->get_tag("h5")) {
    my $text = $parser->get_text();
    last if $text =~ /budget/i;
}
[download]

Comment on HTML::TokeParser - extract values between tags Download Code

Replies are listed 'Best First'.
Re: HTML::TokeParser - extract values between tags by Anonymous Monk on Feb 17, 2014 at 12:06 UTC
you're using interpolating heredocs :) double-quoted here docs :) strict vars or warnings would have warned you #!/usr/bin/perl -- use strict; use warnings; use XML::LibXML 1.70; ## for load_html/load_xml/location use Data::Dump qw/ dd /; my %shabs; my $dom = XML::LibXML->new( qw/ recover 2 / )->load_html( string => $c +ontent ); for my $h5 ( $dom->findnodes( q{ //h5 } ) ){ print $h5->nodePath, "\n"; my $key = $h5->textContent; my $next = $h5->nextSibling; while( $next ){ print $next->nodePath, "\n"; $shabs{$key} .= $next->textContent; $next = $next->nextSibling; last if eval { $next->tagName eq 'h5' } ; } print "\n"; } dd( \%shabs ); [download]	[reply] [d/l]
Re: HTML::TokeParser - extract values between tags by hdb (Monsignor) on Feb 17, 2014 at 10:56 UTC
In what way does your code fail? Even if it parses the HTML correctly, it will not produce any output as there is not print or anything similar in the code.	[reply]
Re: HTML::TokeParser - extract values between tags by Anonymous Monk on Feb 17, 2014 at 10:57 UTC
Why choose tokeparser? Where does this $parser variable come from?	[reply]
Re^2: HTML::TokeParser - extract values between tags by doubledecker (Scribe) on Feb 17, 2014 at 11:07 UTC
Updated the code to reflect the parser object. Apologies for the same.	[reply]
Re^3: HTML::TokeParser - extract values between tags by Anonymous Monk on Feb 17, 2014 at 11:23 UTC
the other question is more important :)	[reply]

Back to Seekers of Perl Wisdom