Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^3: HTML::Parser / Regex

by poj (Abbot)
on May 28, 2017 at 17:15 UTC ( [id://1191442]=note: print w/replies, xml ) Need Help??


in reply to Re^2: HTML::Parser / Regex
in thread HTML::Parser / Regex

..tried to use HTML::Parser, but it was ended up pretty ugly,

What didn't you like with using HTML::Parser ?

#!/usr/bin/perl use warnings; use strict; use HTML::Parser; my %inside = (); my $tbl = -1; my $col; my $row; my @table = (); my $p = HTML::Parser->new( handlers => { start => [ \&start,'tagname' ], end => [ \&end, 'tagname' ], text => [ \&text, 'text' ], } ); $p->parse_file(\*DATA); # or filename # output for my $t (0..$#table){ print "\nTable $t\n"; for my $r (0..$#{$table[$t]}){ my $line = join "\t",$r,@{$table[$t][$r]}; print "$line\n"; } } sub start { my $tag = shift; $inside{$tag} = 1; if ($tag eq 'table'){ ++$tbl; $row = -1; } elsif ($tag eq 'tr'){ ++$row; $col = -1; } elsif ($tag eq 'th'){ ++$col; $table[$tbl][$row][$col] = ''; # or undef } } sub end { my $tag = shift; $inside{$tag} = 0; } sub text { my $str = shift; if ( $inside{'th'} ){ $table[$tbl][$row][$col] = $str; } } __DATA__ </table></body><body bgcolor="black"><h1> Summary</h1><table border="1"><tr><th>Employee A</th><th>-0.82</th> </tr><tr><th>Employee B</th><th>-5.02</th> </tr><tr><th>Employee C</th><th>19</th> </tr></table></body><body bgcolor="black"><h1> Summary</h1><table border="1"><tr><th>Employee A</th><th></th> </tr><tr><th>Employee B</th><th></th> </tr><tr><th>Employee C</th><th></th>
poj

Replies are listed 'Best First'.
Re^4: HTML::Parser / Regex
by MissPerl (Sexton) on May 29, 2017 at 01:20 UTC
    Hi poj,

    thank you for showing this sample of using HTML::Parser!

    Now that I know HTML::Parser actually print nice output on the console screen.

    I am currently studying the code to understand how each lines works. Also try to modify the code to get the output print in my another html file.

    Will come back and ask more questions if I came across something that I couldn't figure out

    I have actually more than 1 table in the html file, they have almost similar tag but different content,

    1. May I know how to just take the particular table?

    2. Is it possible to use HTML::Parser to get the value and store as variable? what should I take note in order to get such output?

Re^4: HTML::Parser / Regex
by jobormo (Initiate) on Sep 10, 2019 at 07:57 UTC
    really really good code, thanks very much, easy to understand and adapt :) Thanks!!!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1191442]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (1)
As of 2024-04-25 00:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found