Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: HTML::Parser / Regex

by Mr. Muskrat (Canon)
on May 26, 2017 at 21:05 UTC ( [id://1191323]=note: print w/replies, xml ) Need Help??


in reply to HTML::Parser / Regex

You are not using strict and warnings.

You have loaded HMTL::Parser instead of HTML::Parser but then you do not try to use it.

You are trying to search an undefined variable $text.

You are trying to use captured values but have not any capture groups.

(May not be a problem in your real code but) you are defining a different variable in each part of the if/elsif blocks.

The pattern you are using to match the numbers is a bit odd. Take a look at Regexp::Common.

You are trying to use regular expressions to search for slashes without changing the "/" delimiters. Regexp quote-like operators.

# partial snippet use strict; use warnings; use Regexp::Common; # ... while( chomp(my $text = <$f1>) ) { my ($one, $two, $three); # Also these variable names are not very de +scriptive. if ($text =~ m!Employee\sA</th><th>($RE{num}{real})<!) { $one = $1; } # ...

Replies are listed 'Best First'.
Re^2: HTML::Parser / Regex
by MissPerl (Sexton) on May 27, 2017 at 06:51 UTC
    Hi Mr. Muskrat,

    Thank you for your reply. I did tried to use HTML::Parser, but it was ended up pretty ugly, so I did not include that part of code.

    Do you have any recommend link for HTML::Parser?

    Apologize for not mentioning what perl version I am using at the first place. I am using v5.8.8.

    And I've tried on the solution you provided, it seems that the version that I am using does't support Regexp::Common.

    Also thanks for pointing out those mistakes I made! And I totally forgot to turn on strict and warnings!
      ... the version that I am using does't support Regexp::Common.

      Why do you say that? What errors/system messages do you get? The code I posted here uses Regexp::Common and runs under Perl 5.8.9. Are you sure you have the module installed on your system?


      Give a man a fish:  <%-{-{-{-<

        Hi there! I've tried to run the your code .

        Can't locate Regexp/Common.pm in @INC (@INC contains: /tools/perl/5.8.8/linux32/lib/5.8.8/...)

        Correct me if I am wrong, I've tried instmodsh for list modules. But I don't see Regexp::Common in the list.

        As the Perl was installed in school's desktop, I dare not to make any changes for installing a new module, I did tried once, then got some permission denied error.

      ..tried to use HTML::Parser, but it was ended up pretty ugly,

      What didn't you like with using HTML::Parser ?

      #!/usr/bin/perl use warnings; use strict; use HTML::Parser; my %inside = (); my $tbl = -1; my $col; my $row; my @table = (); my $p = HTML::Parser->new( handlers => { start => [ \&start,'tagname' ], end => [ \&end, 'tagname' ], text => [ \&text, 'text' ], } ); $p->parse_file(\*DATA); # or filename # output for my $t (0..$#table){ print "\nTable $t\n"; for my $r (0..$#{$table[$t]}){ my $line = join "\t",$r,@{$table[$t][$r]}; print "$line\n"; } } sub start { my $tag = shift; $inside{$tag} = 1; if ($tag eq 'table'){ ++$tbl; $row = -1; } elsif ($tag eq 'tr'){ ++$row; $col = -1; } elsif ($tag eq 'th'){ ++$col; $table[$tbl][$row][$col] = ''; # or undef } } sub end { my $tag = shift; $inside{$tag} = 0; } sub text { my $str = shift; if ( $inside{'th'} ){ $table[$tbl][$row][$col] = $str; } } __DATA__ </table></body><body bgcolor="black"><h1> Summary</h1><table border="1"><tr><th>Employee A</th><th>-0.82</th> </tr><tr><th>Employee B</th><th>-5.02</th> </tr><tr><th>Employee C</th><th>19</th> </tr></table></body><body bgcolor="black"><h1> Summary</h1><table border="1"><tr><th>Employee A</th><th></th> </tr><tr><th>Employee B</th><th></th> </tr><tr><th>Employee C</th><th></th>
      poj
        Hi poj,

        thank you for showing this sample of using HTML::Parser!

        Now that I know HTML::Parser actually print nice output on the console screen.

        I am currently studying the code to understand how each lines works. Also try to modify the code to get the output print in my another html file.

        Will come back and ask more questions if I came across something that I couldn't figure out

        I have actually more than 1 table in the html file, they have almost similar tag but different content,

        1. May I know how to just take the particular table?

        2. Is it possible to use HTML::Parser to get the value and store as variable? what should I take note in order to get such output?

        really really good code, thanks very much, easy to understand and adapt :) Thanks!!!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1191323]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-03-28 16:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found