Re^3: HTML::Parser / Regex

..tried to use HTML::Parser, but it was ended up pretty ugly,

What didn't you like with using HTML::Parser ?

#!/usr/bin/perl
use warnings;
use strict;
use HTML::Parser;

my %inside = ();
my $tbl = -1; my $col; my $row;
my @table = ();

my $p = HTML::Parser->new(
  handlers => {
    start => [ \&start,'tagname' ],
    end   => [ \&end,  'tagname' ],
    text  => [ \&text, 'text' ],      
  }
);
$p->parse_file(\*DATA); # or filename

# output
for my $t (0..$#table){
  print "\nTable $t\n";
  for my $r (0..$#{$table[$t]}){
    my $line = join "\t",$r,@{$table[$t][$r]};
    print "$line\n";
  }
}


sub start {
  my $tag  = shift;
  $inside{$tag} = 1; 
  
  if ($tag eq 'table'){
    ++$tbl; $row = -1;
  } elsif ($tag eq 'tr'){
    ++$row; $col = -1;
  } elsif ($tag eq 'th'){
    ++$col;
    $table[$tbl][$row][$col] = ''; # or undef
  }
} 

sub end {
  my $tag = shift;
  $inside{$tag} = 0;
}

sub text {
  my $str = shift;
  if ( $inside{'th'} ){
    $table[$tbl][$row][$col] = $str;
  }
}

__DATA__
</table></body><body bgcolor="black"><h1>
Summary</h1><table border="1"><tr><th>Employee A</th><th>-0.82</th>
</tr><tr><th>Employee B</th><th>-5.02</th>
</tr><tr><th>Employee C</th><th>19</th>

</tr></table></body><body bgcolor="black"><h1>
Summary</h1><table border="1"><tr><th>Employee A</th><th></th>
</tr><tr><th>Employee B</th><th></th>
</tr><tr><th>Employee C</th><th></th>
[download]

poj

Comment on Re^3: HTML::Parser / Regex Download Code

Replies are listed 'Best First'.
Re^4: HTML::Parser / Regex by MissPerl (Sexton) on May 29, 2017 at 01:20 UTC
Hi poj, thank you for showing this sample of using HTML::Parser! Now that I know HTML::Parser actually print nice output on the console screen. I am currently studying the code to understand how each lines works. Also try to modify the code to get the output print in my another html file. Will come back and ask more questions if I came across something that I couldn't figure out I have actually more than 1 table in the html file, they have almost similar tag but different content, 1. May I know how to just take the particular table? 2. Is it possible to use HTML::Parser to get the value and store as variable? what should I take note in order to get such output?	[reply]
Re^4: HTML::Parser / Regex by jobormo (Initiate) on Sep 10, 2019 at 07:57 UTC
really really good code, thanks very much, easy to understand and adapt :) Thanks!!!	[reply]


XP is just a number
	PerlMonks