Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Regex not working

by imrags (Monk)
on Jul 17, 2009 at 09:24 UTC ( [id://780976]=perlquestion: print w/replies, xml ) Need Help??

imrags has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to remove html tags and fetch the html table data..
I've come to a stage where i need the table between these two html tags:
<div class='roundedBoxBody'><p> #table that i need <p>&nbsp;</p></p>
So, i wrote this regex to get the content
$x = "<div class=\'roundedBoxBody\'><p>"; $y = '<p>&nbsp\;</p></p>'; $content =~ /$x(.*)$y/ and $content_out = $1; print $content_out;
However, it doesn't work, it doesn't give any error as well.
Is there a way to strip the html tags and take the table content into html?
What is the mistake in the code above?
Raghu

Replies are listed 'Best First'.
Re: Regex not working
by davorg (Chancellor) on Jul 17, 2009 at 09:31 UTC

    Your regex is failing to take account of the newline characters.

    However, parsing HTML with a regex is a bad idea. You should look at using a real HTML parser.

    --

    See the Copyright notice on my home node.

    Perl training courses

Re: Regex not working
by prasadbabu (Prior) on Jul 17, 2009 at 09:40 UTC

    Hi Raghu,

    As davorg suggested it is always better to use the HTML Parsers to do these kind of stuffs. You have missed 's' option modifier. Also use qr to quote regular expressions instead of manually backslashing everything.

    use strict; use warnings; my $content = "<div class='roundedBoxBody'><p> <table>sample table</table> <p>&nbsp;</p></p>"; my $x = qr{<div class='roundedBoxBody'><p>}; my $y = qr{<p>&nbsp;</p></p>}; my $content_out = $1 if ($content =~ m|$x(.*)$y|s); print $content_out;

    Prasad

      Also use qr to quote regular expressions instead of manually backslashing everything.

      That's actually not necessary here. The OP was just escaping things that didn't need escaping.

      $x = "<div class='roundedBoxBody'><p>"; $y = '<p>&nbsp;</p></p>';

      Works just as well.

      --

      See the Copyright notice on my home node.

      Perl training courses

Re: Regex not working
by Anonymous Monk on Jul 17, 2009 at 09:36 UTC
    try this:
    $content =~ /$x(.*)$y/s;
    And follow the above recommendation.
Re: Regex not working
by imrags (Monk) on Jul 17, 2009 at 10:10 UTC
    Thank you everyone, the /s was the problem, i had not put it...that prevented the regex from working
    Also, I am planning to use HTML::TreeBuilder...to get the table.
    <table border='1' width='50%' align='center'><tr><td><strong>Customer< +/td><td><strong>Total Samples</td><td><s trong>SL Violations</td><td><strong>Avg Availability</td></tr><tr><td> All Customers</td><td>187556</td><td>2167</td><td>98.84</td> </td></tr></table> <br><p><strong><h2><center>Customers Below 90% Available</center></h2> +</p><table border='1' width='50%' align= 'center'><tr><td><strong>Customer</td><td><strong>Total Samples</td><t +d><strong>SL Violations</td><td><strong> Availability</td></tr> <tr><td>10P</td><td>1064</td><td>130</td><td>87.78 %</td></tr> <tr><td>B8S</td><td>326</td><td>34</td><td>89.57 %</td></tr> </tr></table>
    I'm trying to get individual values from the table and then convert to pdf...
    Would HTML::TreeBuilder be a good choice to fetch data?
    Raghu
      I'm trying to get individual values from the table and then convert to pdf... Would HTML::TreeBuilder be a good choice to fetch data?

      I've minimal experience with it, mainly because each time I pick it up, I've found the interface cumbersome, and unwieldy to use. And it's pretty slow, relatively speaking, although I don't consider that to be an important point.

      I find HTML::Parser much easier to use (although you have to invest some time in learning how to use it). If you install it via a package, do yourself a favour and track down the examples directory that is bundled with the distribution. You will probably find an example that you can adapt to the problem at hand.

      It's a complex tool that's worthwhile mastering if you have to grovel around in HTML files.

      • another intruder with the mooring in the heart of the Perl

        Hi Raghu,

        Since, you are evaluating different perl modules to parse HTML files, you can take a look at HTML::TokeParser. It is an alternative HTML::Parser interface. I have used it and found it pretty helpful.

        - Prantik

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://780976]
Approved by linuxer
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-19 01:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found