If you add
LWP::Simple to
HTML::TableExtract you've got a great deal of flexibility when parsing web pages.
To make page parsing a little easier:
local*FH;
open (FH, '>/tmp/somefile.txt');
$te->parse($content); # $content is your HTML page from LWP::Simple
foreach my $ts ($te->table_states) {
print FH "Table (", join(',', $ts->coords), "):\n";
foreach my $row ($ts->rows) {
print FH join(',', @$row), "\n";
}
}
close FH;
Which is lifted pretty much as is from the
HTML::TableExtract documentation. I output it to a file so i can look at the output with the flexibility of my favourite text editor (vi).
I should point out the above code example will print out the co-ords of the tables in your page and the content of each cell.