Re: Matching against $_ behaves differently than matching against a named scalar?
by choroba (Cardinal) on Apr 20, 2020 at 16:36 UTC
|
You can get the same behaviour for a named variable if you declare it outside the loop:
my $line;
while ($line = <>) {
...
With the declaration inside the condition, it's in fact a different variable every time, so Perl needs to create an extra scope for it, as B::Deparse shows you: $ perl -MO=Deparse -e 'while (<>) { /(.)/ }'
while (defined($_ = readline ARGV)) {
/(.)/;
}
-e syntax OK
$ perl -MO=Deparse -e 'while (my $line = <>) { $line =~ /(.)/ }'
while (defined(my $line = readline ARGV)) {
do {
$line =~ /(.)/
};
}
-e syntax OK
Also note that lines containing 0 as their first or second word are not printed.
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Matching against $_ behaves differently than matching against a named scalar?
by stevieb (Canon) on Apr 20, 2020 at 16:44 UTC
|
choroba answered the why, I'll give a different way to do things.
Instead of checking the condition of the variables after the fact, do it before hand:
use warnings;
use strict;
open my $fh, '<', 'text.txt' or die $!;
while (<$fh>) {
if (/^([^ ]+) ([^ ]+)/) {
# Skip this if $1 and $2 weren't populated
print "$1 $2\n";
}
}
Also note the die() statement if the file can't be opened, and the use of 3-arg open().
One last thing... in your former example, you're missing the + in the regex. | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Matching against $_ behaves differently than matching against a named scalar?
by jcb (Parson) on Apr 21, 2020 at 02:18 UTC
|
Other monks have come close but have not said this specifically; choroba illustrated but did not explain why using a lexical instead of $_ produces a different result.
In Perl, the regex capture variables ($1, $2, etc.) are implicitly local to every containing block, but retain their values within a block until the next successful match replaces them. Introducing a lexical in the loop header implicitly introduces another block scope, which means that the regex capture variables are implicitly reset on every loop iteration, (strictly, each loop iteration has its own set of regex capture variables) but your second example is also subtly different because you forgot to test defined(my $line = <$fh>), so a line that evaluates to a false value will cause that loop to terminate early.
The regex match itself returns a boolean value indicating success in Perl, and standard practice is to test that return value to determine if the regex matched, rather than relying on the truth of the capture variables.
Here's a slightly different example to illustrate:
open (my $fh, '<text.txt');
while (<$fh>) {
print "$1 $2" . "\n" if /^([^ ]+) ([^ ]+)/;
}
The exact rules for the regex capture variables are prickly, with lots of sharp edges, so good practice is to consider the regex capture variables only valid after a successful match until the next match is attempted and to have unspecified values at all other times.
Edited by jcb: As davido pointed out, the defined test is implicit when an I/O operator is used in a loop test. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
I want to clarify something based on documentation from perlop:
while (my $line = <STDIN>) { print $line }
In these loop constructs, the assigned value (whether assignment is
automatic or explicit) is then tested to see whether it is defined. The
defined test avoids problems where the line has a string value that
would be treated as false by Perl; for example a "" or a "0" with no
trailing newline.
So in this case the defined test doesn't need to be done explicitly, it's already being done implicitly.
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] |
Re: Matching against $_ behaves differently than matching against a named scalar?
by rjt (Curate) on Apr 20, 2020 at 20:28 UTC
|
You have a couple of suggestions already, plus the explanation for why $1 and $2 survive successive loop iterations. Here is how I would modify the code:
use 5.010;
use autodie;
open my $fh, '<', 'text.txt';
while (<$fh>) {
my @words = split /\s/;
say "@words[0..1]" if @words >= 2;
}
Note the use of autodie to avoid having to do explicit error checking on open or reads. Also note the use of three-argument open, which is an important security best-practice. Not necessary in your example, since your filename is a literal, but it's a good habit to get into.
It looks like you're really just splitting words on whitespace, so split seemed more natural and expressive to me. Maybe this was a contrived example to show the regex behaviour, and your real code really needs the regex, but for what's in front of me, split would be my choice.
Finally, I like say, but you can of course use print, and drop the 5.010 requirement if you are going for maximum backward compatibility.
Edit: My actual final word is, thanks for teaching your kid some Perl!
use strict; use warnings; omitted for brevity.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Matching against $_ behaves differently than matching against a named scalar?
by BillKSmith (Monsignor) on Apr 20, 2020 at 17:15 UTC
|
I have duplicated your result. (Strawberry perl 5.24.1 on window 7) Very strange indeed! I am not clear which we should 'expect'. Is this an example of the 'nested block' referred to in the documentation of $<digits> in perlvar?
| [reply] [Watch: Dir/Any] |
Re: Matching against $_ behaves differently than matching against a named scalar?
by Marshall (Canon) on Apr 23, 2020 at 01:28 UTC
|
As an general practice, I do not fiddle around with $1 and $2. I use list context to assign these variables to specific names. This avoids some complications and is not "expensive" in terms of CPU..
use strict;
use warnings;
while (<DATA>)
{
if ( (my $first,my $second) = /^([^ ]+) ([^ ]+)/ )
{
print "$first $second\n";
}
}
#prints: hello one
__DATA__
hello one two three
kjsf
kjsd
kjd
Now of course the regex could be written differently. This means the same thing.
use strict;
use warnings;
while (<DATA>)
{
if ( (my $first,my $second) = /^(\S+)\s+(\S+)/ ) #ok, allow an extra
+ spaces between tokens
{
print "$first $second\n";
}
}
#prints: hello one
__DATA__
hello one two three
kjsf
kjsd
kjd
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
if( my( $first, $last) = $line =~ /^([^ ]+) ([^ ]+)/ ){
}
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Matching against $_ behaves differently than matching against a named scalar?
by rsFalse (Chaplain) on Apr 23, 2020 at 21:53 UTC
|
A possible way to overcome it without using a lexical variable inside a loop: to make a successful match to reset $1, $2...
/^([^ ]+) ([^ ]+)/ or /(*ACCEPT)/; # or /(?=)/;
print "$1 $2" . "\n" if $1 && $2;
Also, it is possible to 'control flow', being inside regex:
/^([^ ]+) ([^ ]+)(?{ print "$1 $2" . "\n" })/;
That way of using regex (with (?{ <code> }) construct) is useful for debugging. | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Matching against $_ behaves differently than matching against a named scalar? (use re 'debug';)
by Anonymous Monk on Apr 21, 2020 at 12:14 UTC
|
| [reply] [Watch: Dir/Any] |