Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Wrong regex?

by imrags (Monk)
on Feb 10, 2010 at 06:18 UTC ( [id://822352]=perlquestion: print w/replies, xml ) Need Help??

imrags has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
I've an html page, i want to match this pattern:
</HEAD><BODY><META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"><H2>Set Node + to Monitored <BR> 10.10.10.10 : Minor </H2><FORM METHOD="POST" ENCTYPE="application/x-www-form-urlencoded">
I want to get the IP and the status (10.10.10.10 & Minor)..
I wrote the following code
if ($html =~ /Set Node to Monitored \<BR\>\s+(\w+)\s\:\s(\w+)\W/i) { print "$1 and $2 found" }
The IP and status (minor)keeps changing...
The code I wrote doesn't seem to work. Any help!!! Raghu

Replies are listed 'Best First'.
Re: Wrong regex?
by ikegami (Patriarch) on Feb 10, 2010 at 06:31 UTC
    \w doesn't match punctuation other than underscores. Specifically, \w+ doesn't match 10.10.10.10

      \w does match the _ (underscore) punctuation character.    But it doesn't match any of the others.

        I don't think of it that way, but yeah, I suppose it is a punctuation mark. Fixed.
Re: Wrong regex?
by biohisham (Priest) on Feb 10, 2010 at 08:32 UTC
    • Escaping "<" or ">" isn't necessary.
    • Match digits and non digits of one or more occurrence as \d+ and \D+.
    • You can also match digits of one or more occurrence as [0-9]+
    • Read perlretut.
    #!/usr/local/bin/perl use strict; use warnings; print "IP\t\tStatus\n"; print "-" x 25; print "\n"; while(<DATA>){ chomp; next unless $_ =~ /^<\/HEAD>.*/i; #skip un-interestin +g lines. #my ($ip, $status) = $_=~ m/(\d+\.\d+\.\d+\.\d+)\s+:\s+( +\w+)/; my ($ip, $status) = $_=~ m/([0-9\.]+)\s+:\s+(\w+)/; print "$ip\t $status\n"; } __DATA__ </HEAD><BODY><META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"><H2>Set Node + to Monitored <BR> 10.10.10.10 : Minor </H2><FORM METHOD="POST" ENCTYPE="application/x-www-form-urlencoded"> </HEAD><BODY><META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"><H2>Set Node + to Monitored <BR> 10.10.10.9 : Major </H2><FORM METHOD="POST" ENCTYPE="application/x-www-form-urlencoded"> </HEAD><BODY><META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"><H2>Set Node + to Monitored <BR> 10.10.10.1 : Major </H2><FORM METHOD="POST" ENCTYPE="application/x-www-form-urlencoded">
    #OUTPUT:
    IP Status ------------------------- 10.10.10.10 Minor 10.10.10.9 Major 10.10.10.1 Major


    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.
      Why would you assume that an input line would be "un-interesting" if it doesn't start with </HEAD>.* ? And why did you think you need ".*" in that regex?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://822352]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-25 16:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found