Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Regular Expression Matching Issue

by Anonymous Monk
on Jan 02, 2008 at 18:57 UTC ( #660033=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi there Monks!
Can someone explain why this code does not match on my code? I have to use regular expression for this.
Thank you very much!!!

#!/perl/bin/perl use CGI qw(:header); use CGI::Carp qw(fatalsToBrowser); use CGI qw/:standard/; use strict; print header(); use warnings 'all'; my $directory = "/test"; my $new_file = "$directory/test.txt"; my $out_put; open(FH,'<',$new_file) || die $!; my @test = <FH>; foreach my $line (@test) { $line =~ /<!--\s+\d+nd movie -->\s*(.*?)\s*<!--\s+\/\d+nd movie -- +>/sgi; $out_put = $1; print $out_put; } close(FH);

Here is the test.txt file, I am trying to get whatever is in between tags - <!-- 1nd  --><!-- /1nd table --> and <!-- 2nd table --><!-- /2nd table -->:


test.txt
<!-- 3nd table --><table cellpadding="0" cellspacing="0" border="0"><t +r><td>Test Line no end of line.</td</tr></table><!-- /3nd table --> <!-- 1nd --> <table cellpadding="2" cellspacing="0" border="0"><tr><td><font color= +red>Test Line</font></td></tr></table> <!-- /1nd table --> <!-- 4nd table --><table cellpadding="0" cellspacing="0" border="1"><t +r><td><font color=navy>Test Line no end of line.</font></td></tr></ta +ble><!-- /4nd table --> <!-- 2nd table --> <table cellpadding="0" cellspacing="0" border="0"><tr><td>Test Line no + end of line.</td</tr></table> <!-- /2nd table -->


Thank you very much!!!

Replies are listed 'Best First'.
Re: Regular Expression Matching Issue
by Corion (Pope) on Jan 02, 2008 at 19:12 UTC

    Just last year, we had a very similar problem here, at Pattern Search on HTML source., which got some good answers. Maybe you can use the answers given there.

      I went there, I am trying to use regular expressions for this, trying not to use any modules like HTML::Parser or XML::XPath.
Re: Regular Expression Matching Issue
by toolic (Bishop) on Jan 02, 2008 at 19:12 UTC
    One reason the regular expression does not match is because you do not have the string "movie" in your test.txt file. There may be other reasons, but I would start with that.

    Also, you should first check if the regex matched before trying to use $1.

    There are plenty of good HTML parsers available; why do you want to re-invent that wheel?

      Wrong type it should ne table, still doesn't work!!!
Re: Regular Expression Matching Issue
by olus (Curate) on Jan 02, 2008 at 21:38 UTC
    I didn't check the regular expression in order to see if did what you it to do, but I see that you trying to use it as if was applied to the _whole_ file, and you are using it only for a single line each time. The 's' switch isn't working because you are working on a single line.

    'join' @test and work on that, or change the foreach cicle and look for lines that have the block begin, capture everything until you find another line that matches the block ending.

      I agreed with olus

      You can 'join' @test or while reading from the file itself you can read the whole file in single scalar variable using undef as.

      open(FH,'<',$new_file) || die $!; do {local undef $/;$test = <FH>}; close(FH); while($test =~ /<!--\s+\d+nd table -->\s*(.*?)\s*<!--\s+\/\d+nd table +-->/sgi){ $out_put = $1; print "$out_put\n"; }

      I think this code is matching what did you expected

      Punitha

        Thanks, it works like that!!!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://660033]
Approved by pKai
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2020-07-10 23:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?