Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: HTML::Parser / Regex

by MissPerl (Sexton)
on May 27, 2017 at 06:51 UTC ( [id://1191349]=note: print w/replies, xml ) Need Help??


in reply to Re: HTML::Parser / Regex
in thread HTML::Parser / Regex

Hi Mr. Muskrat,

Thank you for your reply. I did tried to use HTML::Parser, but it was ended up pretty ugly, so I did not include that part of code.

Do you have any recommend link for HTML::Parser?

Apologize for not mentioning what perl version I am using at the first place. I am using v5.8.8.

And I've tried on the solution you provided, it seems that the version that I am using does't support Regexp::Common.

Also thanks for pointing out those mistakes I made! And I totally forgot to turn on strict and warnings!

Replies are listed 'Best First'.
Re^3: HTML::Parser / Regex
by AnomalousMonk (Archbishop) on May 27, 2017 at 07:30 UTC
    ... the version that I am using does't support Regexp::Common.

    Why do you say that? What errors/system messages do you get? The code I posted here uses Regexp::Common and runs under Perl 5.8.9. Are you sure you have the module installed on your system?


    Give a man a fish:  <%-{-{-{-<

      Hi there! I've tried to run the your code .

      Can't locate Regexp/Common.pm in @INC (@INC contains: /tools/perl/5.8.8/linux32/lib/5.8.8/...)

      Correct me if I am wrong, I've tried instmodsh for list modules. But I don't see Regexp::Common in the list.

      As the Perl was installed in school's desktop, I dare not to make any changes for installing a new module, I did tried once, then got some permission denied error.

        As the Perl was installed in school's desktop, I dare not to make any changes for installing a new module, I did tried once, then got some permission denied error.

        See Yes, even you can use CPAN

        You will need an older version of Regexp::Common, since the newest one requires perl version 5.010 (v5.10.0) as minimum. Regexp-Common-2016020301 is the last release whose requirement is below 5.10.0 (it requires 5.00473).

        You dont't even need to install it. Quick and dirty test setup:

        • get the tarball
        • unpack it to some directory (e.g. C:\\Users\me\Desktop\perl)
        • set the environment variable PERL5LIB to include C:\\Users\me\Desktop\perl\Regexp-Common-2016020301\lib
        • run your perl program which uses Regexp::Common
        perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re^3: HTML::Parser / Regex
by poj (Abbot) on May 28, 2017 at 17:15 UTC
    ..tried to use HTML::Parser, but it was ended up pretty ugly,

    What didn't you like with using HTML::Parser ?

    #!/usr/bin/perl use warnings; use strict; use HTML::Parser; my %inside = (); my $tbl = -1; my $col; my $row; my @table = (); my $p = HTML::Parser->new( handlers => { start => [ \&start,'tagname' ], end => [ \&end, 'tagname' ], text => [ \&text, 'text' ], } ); $p->parse_file(\*DATA); # or filename # output for my $t (0..$#table){ print "\nTable $t\n"; for my $r (0..$#{$table[$t]}){ my $line = join "\t",$r,@{$table[$t][$r]}; print "$line\n"; } } sub start { my $tag = shift; $inside{$tag} = 1; if ($tag eq 'table'){ ++$tbl; $row = -1; } elsif ($tag eq 'tr'){ ++$row; $col = -1; } elsif ($tag eq 'th'){ ++$col; $table[$tbl][$row][$col] = ''; # or undef } } sub end { my $tag = shift; $inside{$tag} = 0; } sub text { my $str = shift; if ( $inside{'th'} ){ $table[$tbl][$row][$col] = $str; } } __DATA__ </table></body><body bgcolor="black"><h1> Summary</h1><table border="1"><tr><th>Employee A</th><th>-0.82</th> </tr><tr><th>Employee B</th><th>-5.02</th> </tr><tr><th>Employee C</th><th>19</th> </tr></table></body><body bgcolor="black"><h1> Summary</h1><table border="1"><tr><th>Employee A</th><th></th> </tr><tr><th>Employee B</th><th></th> </tr><tr><th>Employee C</th><th></th>
    poj
      Hi poj,

      thank you for showing this sample of using HTML::Parser!

      Now that I know HTML::Parser actually print nice output on the console screen.

      I am currently studying the code to understand how each lines works. Also try to modify the code to get the output print in my another html file.

      Will come back and ask more questions if I came across something that I couldn't figure out

      I have actually more than 1 table in the html file, they have almost similar tag but different content,

      1. May I know how to just take the particular table?

      2. Is it possible to use HTML::Parser to get the value and store as variable? what should I take note in order to get such output?

      really really good code, thanks very much, easy to understand and adapt :) Thanks!!!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1191349]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-25 06:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found