Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

regular expression.

by Anonymous Monk
on Jul 23, 2010 at 12:41 UTC ( [id://851032]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

use strict; use warnings; open FH,"data" or die "can't open the file"; while(<FH>) { if(~/^<dsk1/) { my $line=$_; # print $line; $line=~/.*\sid=\"(.*)\"\slo=\"(.*)\"\sto=\"(.*)\"\srb= +.*$/; print "$2\n\n"; print "$&\n"; } }
Question: Here I don't want the $& have whole the value of line. I need $2 only even I omit the extra groups also the $& have value. How can I avoid this?

Replies are listed 'Best First'.
Re: regular expression.
by biohisham (Priest) on Jul 23, 2010 at 18:49 UTC
    First of all, $& will not have the whole value of line, because it holds only the string matched by the pattern match - by the last successful pattern match to be precise - this means that even if the current string doesn't match with the pattern match the value of $& will not be changed. The string value held in $& will be updated to the current successful pattern matched as soon as the pattern matches the string.

    Be warned that performance is compromised if the $& variable is used any where in the program and that this variable is a read-only variable.

    Secondly, since you've not provided an example of the lines that you're reading from to perform this match it is a bit unwieldy to try to replicate

    $line=~/.*\sid=\"(.*)\"\slo=\"(.*)\"\sto=\"(.*)\"\srb=.*$/;
    while you only need $2 then why do you wanna worry about $& in the first place?

    Does your code take care of when the lines don't match with the above pattern or it generates an uninitialized variable warning when printing $2?

    #maybe try something like print "$2\n\n" if $2;


    Excellence is an Endeavor of Persistence. A Year-Old Monk :D .
Re: regular expression.
by ww (Archbishop) on Jul 23, 2010 at 18:42 UTC

    AnonyMonk Update: and JediWizard have accurate answers has an accurate answer </update> for the question you appear to be asking but perhaps the question you intended is about why $& has values.

    If so, note that $& is a special regex variable that -- quoting from Chapter 7 in "Mastering Regular Expressions" (page 299 in my 2nd Ed. paperback) -- "A successful match or substitution sets a variety of global, read-only variables that are always automatically, dynamically scoped. These values never change if a match attempt is unsuccessful, and are alwaysset when a match is successful." (emphasis in the original; but note especially the first clause of the last sentence)

    Update#2 (See AnomalousMonk's below): Your line 8, if(~/^<dsk1/) matches on "<dsk1" and therefore sets $&. Your line 8 merely tests whether bitwise negation of the string is possible. Since there is no match the prior match is retained in $& (I think). </Update#2> Execution (modified code below) produces this:

    perl 851032-orig.pl $& at line 9: <dsk1 ********** 1 $& at line 14: <dsk1 line1 id="123" lo="1" to="abc" rb="This is a long + rb." ------------------ $& at line 9: <dsk1 ********** 2 $& at line 14: <dsk1 line2 id="456" lo="2" to="def" rb=Short rb" ------------------ $& at line 9: <dsk1 ********** 3 $& at line 14: <dsk1 line3 id="789" lo="3" to="ghi" rb="Medium long rb +" ------------------ $& at line 9: <dsk1 ********** Use of uninitialized value in concatenation (.) or string at 851032-or +ig.pl line 13, <DATA> line 4. $& at line 14: <dsk1 ------------------ $& at line 9: <dsk1 ********** 5 $& at line 14: <dsk2 line5 id="555" lo="5" to="jkl" rb="should not mat +ch" ------------------ $& at line 9: <dsk1 ********** 6 $& at line 14: <dsk1 Line6 id="987" lo="6" to="mno" rb="This should be + a match" ------------------ $& at line 9: <dsk1 ********** 7 $& at line 14: <dsk1 line7 id="FFF" lo="7" to="pqr" rb="This is a very +, very, very long are-bee." ------------------

    using this, slightly modified code:

    #!/usr/bin/perl use strict; use warnings; # 851032-orig (but using data) # open FH,"data" or die "can't open the file"; while(<DATA>) { if(~/^<dsk1/) { print "\$& at line 9: $& \n **********\n"; # added my $line=$_; # added $line=~/.*\sid=\"(.*)\"\slo=\"(.*)\"\sto=\"(.*)\"\srb=.*$/; print "$2\n\n"; print "\$& at line 14: $&\n ------------------\n"; } } __DATA__ <dsk1 line1 id="123" lo="1" to="abc" rb="This is a long rb." <dsk1 line2 id="456" lo="2" to="def" rb=Short rb" <dsk1 line3 id="789" lo="3" to="ghi" rb="Medium long rb" <dsk2 line4 <dsk2 line5 id="555" lo="5" to="jkl" rb="should not match" <dsk1 Line6 id="987" lo="6" to="mno" rb="This should be a match" <dsk1 line7 id="FFF" lo="7" to="pqr" rb="This is a very, very, very lo +ng are-bee."

    Note also that you are using conventional regex notation at line 12 but not in line 9. Update2: I'm unclear why line 9 passes a syntax check... but that just means I have more fun hunting up the answer on that. (As noted, Anomalous Monk answers below.)

    Hence, I'm posting code and output with the match syntax (rather than bitwise negation):

    #!/usr/bin/perl use strict; use warnings; # 851032 #open FH,"data" or die "can't open the file"; while(<DATA>) { my $line=$_; if( $line =~ /^<dsk1/ ) { print "\$line: $line"; if ($line =~ /.*\sid=\"(.*)\"\slo=\"(.*)\"\sto=\"(.*)\"\srb=.* +$/) { print "\$1: $1, \$2: $2\, \$3: $3 \n"; print "-----------------\n"; } else { print "Some of this will be uninitialized: "; print "\$1: $1, \$2: $2\, \$3: $3 \n"; # print "\$&: $& \n\n---\n"; } } else { print "\$line did NOT start with '^dsk1': $line \n =========== +======\n\n"; } } =head Output: (see also 851032-orig.pl) perl 851032.pl $line: <dsk1 line1 id="123" lo="1" to="abc" rb="This is a long rb." $1: 123, $2: 1, $3: abc ----------------- $line: <dsk1 line2 id="456" lo="2" to="def" rb=Short rb" $1: 456, $2: 2, $3: def ----------------- $line: <dsk1 line3 id="789" lo="3" to="ghi" rb="Medium long rb" $1: 789, $2: 3, $3: ghi ----------------- $line did NOT start with '^dsk1': <dsk2 line4 ================= $line did NOT start with '^dsk1': <dsk2 line5 id="555" lo="5" to="jkl" + rb="should not match" ================= $line: <dsk1 Line6 id="987" lo="6" to="mno" rb="This should be a match +" $1: 987, $2: 6, $3: mno ----------------- $line: <dsk1 line7 id="FFF" lo="7" to="pqr" rb="This is a very, very, +very long are-bee." $1: FFF, $2: 7, $3: pqr ----------------- =cut __DATA__ <dsk1 line1 id="123" lo="1" to="abc" rb="This is a long rb." <dsk1 line2 id="456" lo="2" to="def" rb=Short rb" <dsk1 line3 id="789" lo="3" to="ghi" rb="Medium long rb" <dsk2 line4 <dsk2 line5 id="555" lo="5" to="jkl" rb="should not match" <dsk1 Line6 id="987" lo="6" to="mno" rb="This should be a match" <dsk1 line7 id="FFF" lo="7" to="pqr" rb="This is a very, very, very lo +ng are-bee."

    </update2a>

    A few other suggestions:

    • You can be far more precise and confident of your results if your id, lo, and to are consistent by replacing dot-star (aka death-star for the trouble it can bring you) with an appropriate pattern.
    • Testing your regex results is at least (IMO) as important as testing a file_open, which you've done (albeit in 2-argument form, vice the preferred 3-argument open)
      for example,  if ($1 ) { do something with it; } else { report the failure; }
      I'm unclear why line 9 passes a syntax check...

      Unary  ~ is bitwise negation. See Symbolic Unary Operators in perlop. It compiles, but is probably not logically correct. E.g.:

      >perl -wMstrict -le "$_ = 'x'; print scalar(~ /x/); print scalar(~ /Y/); print 'match' if ~ /x/; print 'match' if ~ /Y/; " 4294967294 4294967295 match match

      Update: Improved example code.

Re: regular expression.
by JediWizard (Deacon) on Jul 23, 2010 at 18:20 UTC

    It is only a little more complicated than the previous post implies. The $& varaible will have a value for every regular expression in a program if it is used with any regular expression. The same is also true for $` and $'.

    See perlre or perlvar


    They say that time changes things, but you actually have to change them yourself.

    —Andy Warhol

Re: regular expression.
by Anonymous Monk on Jul 23, 2010 at 12:51 UTC
    Here I don't want the $& have whole the value of line.

    Don't use $& and it won't have a value, its that simple

Re: regular expression.
by Marshall (Canon) on Jul 25, 2010 at 15:33 UTC
    I'm not sure what you are trying to accomplish. The natural thing to think about with x=123 y=456 type pairs is a hash. Below I show how to generate that hash with match global. You could check if the "lo" key is defined or not.

    Stay away from this $& stuff as that will slow all regexes down. And its very seldom needed. If you just need to check if: lo="XYZZY" is on the line, the regex is easier than below. But this a general name="value" solution.

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; while (<DATA>) { next unless (/^<dsk1/); my %hash = m/(\w+)\s*=\s*"(.*?)"/g; next unless keys %hash; #skip blank hashes (no pairs) print "name=value pairs:\n"; foreach my $name (sort keys %hash) { print " $name=>$hash{$name}\n"; } print "\n"; } =prints: name=value pairs: id=>123 lo=>1 rb=>This is a long rb. to=>abc name=value pairs: id=>456 lo=>2 rb=>Short rb to=>def name=value pairs: id=>789 lo=>3 rb=>Medium long rb to=>ghi name=value pairs: id=>987 lo=>6 rb=>This should be a match to=>mno name=value pairs: id=>FFF lo=>7 rb=>This is a very, very, very long are-bee. to=>pqr =cut __DATA__ <dsk1 line1 id="123" lo="1" to="abc" rb="This is a long rb." <dsk1 line2 id="456" lo="2" to="def" rb="Short rb" <dsk1 line3 id="789" lo="3" to="ghi" rb="Medium long rb" <dsk2 line4 <dsk2 line5 id="555" lo="5" to="jkl" rb="should not match" <dsk1 Line6 id="987" lo="6" to="mno" rb="This should be a match" <dsk1 line7 id="FFF" lo="7" to="pqr" rb="This is a very, very, very lo +ng are-bee."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://851032]
Approved by ahmad
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2024-03-28 19:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found