Re^2: Strange regex to test for newlines: /.*\z/
by Ojosh!ro (Beadle) on May 21, 2007 at 13:47 UTC
|
I don't think it's a bug.
When the match is in /m mode .* will match anything BUT a newline. ( when in /s mode .* will match anything )
I assume what it is trying to match is one line.
So basically what this test does is :
"Between all characters (on this line) that are not newlines, and the end of the string, are there any other characters?", if so, it won't match. If it doesn't match, the only character that can cause it is a newline.
It does sound a bit like a roundabout way to get what you want though.
How about if ( $foo !~ /\n\z/ )
BTW. setting $/ has no influence on /m or /s whatsoever? Not that I could find with experimentation.
if( exists $aeons{strange} ){ die $death unless ( $death%2 ) }
| [reply] [d/l] |
|
One problem tho, the following all match the string "\n":
/.*/
/\z/
/.{0}\z/
It's possible that \z is meant to introduce some specialness when combined with .* (or possibly some other quantifiers), but I haven't seen it mentioned in any docs. This is either a bug, or a very poorly documented feature. | [reply] [d/l] |
|
.* will match anything but a newline, or the empty string.
So I'd expect
"foo\n" =~ /.*\z/;
to match, but capture the empty string in $&, not "foo\n".
Of course there are more elaborate ways to match for a newline character ;-)
| [reply] [d/l] |
|
506 $ perl -we'print "yes" if "" =~ /.*/'
yes
507 $ perl -we'print "yes" if "\n" =~ /.*/'
yes
| [reply] [d/l] |
|
According to your reasoning, the first of the following one-liners shouldn't print anything either:
$ perl -lwe 'print "match" if "foo\n" =~ /[^\n]*\z/'
match
$ perl -lwe 'print "match" if "foo\n" =~ /.*\z/'
| [reply] [d/l] |
Re^2: Strange regex to test for newlines: /.*\z/
by moritz (Cardinal) on May 21, 2007 at 13:21 UTC
|
| [reply] |
|
In r31303 of bleadperl this bug is fixed:
$ perl5.9.5 -E 'say "match" if "f\n" ~~ /.*\z/'
match
| [reply] |
|
| [reply] |
Re^2: Strange regex to test for newlines: /.*\z/
by xicheng (Sexton) on May 21, 2007 at 15:45 UTC
|
No, it's not a bug. check carefully what's the difference between \z and \Z. and check the following samples:
perl -e 'print "match\n" if "foo\n" =~ /.*\z/'
perl -e 'print "match\n" if "foo\n" =~ /.*\Z/'
perl -e 'print "match\n" if "foo\n\n\n" =~ /.*\Z/'
Update: the third one matches just coz of .* in use. \Z can not keep multiple newlines.
Regards,
Xicheng | [reply] [d/l] |
|
perl -e 'print "match\n" if "foo\n" =~ /.{0,}\z/'
AFAIK, .* and .{0,} should be exactly equivilent, but when combined with /z they are not, if the string ends in a newline.
There definitely appears to be a bug here, but it may be that the above snippet should not match, rather than the version with .* matching. | [reply] [d/l] [select] |
|
hmm, Just notice that, thanks..
I think, .* and .{0,} at the beginning of a regex pattern shold have been treated as optional, so that /.*A/ and /.{0,}A/ should be the same as /A/ which means .* and .{0,} are completely unnecessary in the above patterns..
But \z looks behave very differently to .* and .{0,} as you mentioned.
This looks like a Perl-related problem, PHP(use a similar regex engine) does it pretty well:
php -r '
$str = "foo\n";
if (preg_match("/.*\z/", $str)) {
print "match\n";
}
'
match
Probably it's a bug, and I am waiting for someone to make it clear. :-)
Regards,
Xicheng | [reply] [d/l] |
|
Indeed.
Quoting and a bit paraphrasing "Mastering Regular Expressions 2nd Edition":
A match mode can change the meaning of "$" to match before any embedde
+d newline (or Unicode line terminator as well). When supported, "\Z"
+usually matches what the "unmoded" "$" matches, which often means to
+match at the end of the string, or before a string-ending newline. To
+ complement these, "\z" matches only at the end of the string, period
+, without regard to any newline.
..
//s stands for Single Line Mode which makes the dot match any characte
+r.
..
//m stands for Multi Line Mode which changes how ^& $ are considered b
+y the regex engine. ^ is then begin of 1 line out of the many lines i
+n the string and not begin of string and $ is end of 1 line out of th
+e many lines in the string and not end of string.
..
Caret "^" matches at the beginning of the text being searched, and, if
+ in an enhanced line-anchor match mode after any newline.
..
\A always matches only at the start of the text being searched, regard
+less of single or multi line match mode.
..
"\Z" matches what the "unmoded" "$" matches, which means to match at t
+he end of the string, or before a string-ending newline. To complemen
+t these, "\z" matches only at the end of the string, period, without
+regard to any newline.
With thanks to Jeffrey Friedl's Regex Holy Book! ;-) | [reply] [d/l] |