HTML::TableExtract indicates that there is a "decode" constructor attribute that is described as follows:
Automatically decode retrieved text with HTML::Entities::decode_entities(). Enabled by default. Has no effect if keep_html was specified or if extracting into an element tree structure.
The following works for me:
my $html = qq{
<HTML>
<BODY>
<table border="1">
<tr><td align="center" nowrap><font size="2"><u>Activity #</u><t
+d align="center"><font size="2">Some ID<br>/Debit ID</font>
+</td></tr>
<tr><td align="right"><font size="2">588476377</font></td><td><f
+ont size="2"><a href="/cgi-bin/page?id=1275591">1275591</a></font></t
+d></tr>
<tr><td align="right"><font size="2">588484813</font></td><td><f
+ont size="2"><a href="/cgi-bin/page?id=1210540">1210540</a></font></t
+d></tr>
</table>
</BODY>
</HTML>
};
my $te = HTML::TableExtract->new( headers => ['Some ID'] , decode
+ => 0);
$te->parse($html);
eval {
$te->rows;
};
if ( $@ ) {
print "No rows found\n";
}
print Dumper($te->rows);