http://qs321.pair.com?node_id=616709


in reply to Re: Strange regex to test for newlines: /.*\z/
in thread Strange regex to test for newlines: /.*\z/

I am not quite agreeable to the statement about what '.*' should match.

For my understanding '.' should ignore newlines always but if the operator /s is used. That means that '.+' and '.*' are just multiple searches of '.' and should still ignore newlines.

Now I understand $ and \z as the following... $ means to matches both the end and the newline before - quote perldoc - and \z only the end but not the newline.
print "foo matched\n"         if "foo\n"     =~  /^foo$/;
print "bar matched\n"         if "bar\n"     =~  /^bar$ \n/x;   # $ before end or newline
print "baz doesn't matched\n" if "baz\n"     !~  /^baz\z/;
print "foobar matched\n"      if "foobar\n"  =~  /^foobar\n\z/; # \z after newline

print "match foo\n"         if "foo\n" =~ /.*$/;     # .* ignore newline and $  is before newline
print "doesn't match bar\n" if "bar\n" !~ /.*\z/;    # .* ignore newline and \z is after  newline
print "match baz\n"         if "baz\n" =~ /.?\z/;    # but what the hell happends here?

for ( qr/(.?)\n\z/, qr/(.?)\z/ ) {
   "hello world\n" =~ $_;
   print "-$1-\n";
}

-d-
--
It seems that '.?' ignore the newline as expected and search on after the newline with '.?\z', because it searches _until_ '\z'. Also it seems that '.*' matches until the newline and not between '\n' and '\z'. '.*' is greedy, '.?' not. Maybe I missunderstand it.
  • Comment on Re^2: Strange regex to test for newlines: /.*\z/

Replies are listed 'Best First'.
Re^3: Strange regex to test for newlines: /.*\z/
by xicheng (Sexton) on May 22, 2007 at 18:17 UTC
    $ and \Z work pretty much the same in normal mode, both match the end of search string or before a string-ending newline. the difference between them lies in the multiline mode when you issue an 'm' modifier.

    \z means the real end of string even after the string-ending newline.

    If you use an 's' modifier, then things become more different but that's mainly coz of the '.' which changes its behaviors, not the three end-of-string anchors..

    check the following snippets:
    perl -e 'print "match\n" if "foo\n" =~ /.+$/' # ok # perl -e 'print "match\n" if "foo\n" =~ /.+\z/' perl -e 'print "match\n" if "foo\n" =~ /.+\Z/' # ok # perl -e 'print "match\n" if "foo\n\n\n" =~ /.+\Z/' perl -e 'print "match\n" if "foo\n\n\n" =~ /.+\z/' perl -e 'print "match\n" if "foo\n\n\n" =~ /.+$/' perl -e 'print "match\n" if "foo\n\n\n" =~ /.+$/m' # ok # perl -e 'print "match\n" if "foo\n\n\n" =~ /.+\z/m' perl -e 'print "match\n" if "foo\n\n\n" =~ /.+\Z/m' perl -e 'print "match\n" if "foo\n\n\n" =~ /.+\Z/s' # ok # perl -e 'print "match\n" if "foo\n\n\n" =~ /.+\z/s' # ok # perl -e 'print "match\n" if "foo\n\n\n" =~ /.+$/s' # ok #
    BTW. When comparing between \z, \Z and $, it's probably better to avoid using .* or .? quanifiers the ways in your examples.

    BTW. my previous statement about \Z had some error and I have updated that post.

    Regards,
    Xicheng