Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Strange regex to test for newlines: /.*\z/

by xicheng (Sexton)
on May 21, 2007 at 15:45 UTC ( [id://616593]=note: print w/replies, xml ) Need Help??


in reply to Re: Strange regex to test for newlines: /.*\z/
in thread Strange regex to test for newlines: /.*\z/

No, it's not a bug. check carefully what's the difference between \z and \Z. and check the following samples:
perl -e 'print "match\n" if "foo\n" =~ /.*\z/' perl -e 'print "match\n" if "foo\n" =~ /.*\Z/' perl -e 'print "match\n" if "foo\n\n\n" =~ /.*\Z/'
Update: the third one matches just coz of .* in use. \Z can not keep multiple newlines.

Regards,
Xicheng

Replies are listed 'Best First'.
Re^3: Strange regex to test for newlines: /.*\z/
by Mutant (Priest) on May 21, 2007 at 15:55 UTC
    Fair enough, but try:
    perl -e 'print "match\n" if "foo\n" =~ /.{0,}\z/'
    AFAIK, .* and .{0,} should be exactly equivilent, but when combined with /z they are not, if the string ends in a newline.

    There definitely appears to be a bug here, but it may be that the above snippet should not match, rather than the version with .* matching.
      hmm, Just notice that, thanks..

      I think, .* and .{0,} at the beginning of a regex pattern shold have been treated as optional, so that /.*A/ and /.{0,}A/ should be the same as /A/ which means .* and .{0,} are completely unnecessary in the above patterns..

      But \z looks behave very differently to .* and .{0,} as you mentioned.

      This looks like a Perl-related problem, PHP(use a similar regex engine) does it pretty well:
      php -r ' $str = "foo\n"; if (preg_match("/.*\z/", $str)) { print "match\n"; } ' match
      Probably it's a bug, and I am waiting for someone to make it clear. :-)

      Regards,
      Xicheng
Re^3: Strange regex to test for newlines: /.*\z/
by ddn123456 (Pilgrim) on May 22, 2007 at 07:57 UTC
    Indeed. Quoting and a bit paraphrasing "Mastering Regular Expressions 2nd Edition":
    A match mode can change the meaning of "$" to match before any embedde +d newline (or Unicode line terminator as well). When supported, "\Z" +usually matches what the "unmoded" "$" matches, which often means to +match at the end of the string, or before a string-ending newline. To + complement these, "\z" matches only at the end of the string, period +, without regard to any newline. .. //s stands for Single Line Mode which makes the dot match any characte +r. .. //m stands for Multi Line Mode which changes how ^& $ are considered b +y the regex engine. ^ is then begin of 1 line out of the many lines i +n the string and not begin of string and $ is end of 1 line out of th +e many lines in the string and not end of string. .. Caret "^" matches at the beginning of the text being searched, and, if + in an enhanced line-anchor match mode after any newline. .. \A always matches only at the start of the text being searched, regard +less of single or multi line match mode. .. "\Z" matches what the "unmoded" "$" matches, which means to match at t +he end of the string, or before a string-ending newline. To complemen +t these, "\z" matches only at the end of the string, period, without +regard to any newline.
    With thanks to Jeffrey Friedl's Regex Holy Book! ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://616593]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-25 07:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found