Re: Extract HTML rows with headers specified

HTML::TableExtract indicates that there is a "decode" constructor attribute that is described as follows:

Automatically decode retrieved text with HTML::Entities::decode_entities(). Enabled by default. Has no effect if keep_html was specified or if extracting into an element tree structure.

The following works for me:

my $html = qq{
<HTML>
  <BODY>
    <table border="1">
      <tr><td align="center" nowrap><font size="2"><u>Activity #</u><t
+d align="center"><font size="2">Some&nbsp;ID<br>/Debit&nbsp;ID</font>
+</td></tr>
      <tr><td align="right"><font size="2">588476377</font></td><td><f
+ont size="2"><a href="/cgi-bin/page?id=1275591">1275591</a></font></t
+d></tr>
      <tr><td align="right"><font size="2">588484813</font></td><td><f
+ont size="2"><a href="/cgi-bin/page?id=1210540">1210540</a></font></t
+d></tr>
    </table>
  </BODY>
</HTML>
};

my $te = HTML::TableExtract->new( headers => ['Some&nbsp;ID'] , decode
+ => 0);
$te->parse($html);

eval {
    $te->rows;
};

if ( $@ ) {
    print "No rows found\n";
}

print Dumper($te->rows);
[download]

Comment on Re: Extract HTML rows with headers specified Download Code


P is for Practical
	PerlMonks