..tried to use HTML::Parser, but it was ended up pretty ugly,
What didn't you like with using HTML::Parser ?
#!/usr/bin/perl
use warnings;
use strict;
use HTML::Parser;
my %inside = ();
my $tbl = -1; my $col; my $row;
my @table = ();
my $p = HTML::Parser->new(
handlers => {
start => [ \&start,'tagname' ],
end => [ \&end, 'tagname' ],
text => [ \&text, 'text' ],
}
);
$p->parse_file(\*DATA); # or filename
# output
for my $t (0..$#table){
print "\nTable $t\n";
for my $r (0..$#{$table[$t]}){
my $line = join "\t",$r,@{$table[$t][$r]};
print "$line\n";
}
}
sub start {
my $tag = shift;
$inside{$tag} = 1;
if ($tag eq 'table'){
++$tbl; $row = -1;
} elsif ($tag eq 'tr'){
++$row; $col = -1;
} elsif ($tag eq 'th'){
++$col;
$table[$tbl][$row][$col] = ''; # or undef
}
}
sub end {
my $tag = shift;
$inside{$tag} = 0;
}
sub text {
my $str = shift;
if ( $inside{'th'} ){
$table[$tbl][$row][$col] = $str;
}
}
__DATA__
</table></body><body bgcolor="black"><h1>
Summary</h1><table border="1"><tr><th>Employee A</th><th>-0.82</th>
</tr><tr><th>Employee B</th><th>-5.02</th>
</tr><tr><th>Employee C</th><th>19</th>
</tr></table></body><body bgcolor="black"><h1>
Summary</h1><table border="1"><tr><th>Employee A</th><th></th>
</tr><tr><th>Employee B</th><th></th>
</tr><tr><th>Employee C</th><th></th>
poj
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|