Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^3: Strange regex to test for newlines: /.*\z/

by shmem (Chancellor)
on May 21, 2007 at 14:26 UTC ( [id://616569]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Strange regex to test for newlines: /.*\z/
in thread Strange regex to test for newlines: /.*\z/

Because in a //m, the end of string matching "f\n" is set before the '\n' if the '\n' is trailing. The '\n' is skipped in the match, but the position after "f" isn't the end of the string:
perl -D512 -e '$_ = "f\n";/.*\z/' Compiling REx `.*\z' size 4 Got 36 bytes for offset annotations. first at 2 rarest char at 0 1: STAR(3) 2: REG_ANY(0) 3: EOS(4) 4: END(0) floating ""$ at 0..2147483647 (checking floating) anchored(MBOL) impli +cit minlen 0 Offsets: [4] 2[1] 1[1] 3[2] 5[0] Omitting $` $& $' support. EXECUTING... Guessing start of match, REx ".*\z" against "f "... Found floating substr ""$ at offset 1... Position at offset 0 does not contradict /^/m... Guessed: match at offset 0 Matching REx ".*\z" against "f " Setting an EVAL scope, savestack=3 0 <> <f > | 1: STAR REG_ANY can match 1 times out of 2147483647 +... Setting an EVAL scope, savestack=3 1 <f> < > | 3: EOS failed... failed... Guessing start of match, REx ".*\z" against " "... Found floating substr ""$ at offset 0... Position at offset 0 does not contradict /^/m... Guessed: match at offset 0 Setting an EVAL scope, savestack=3 1 <f> < > | 1: STAR REG_ANY can match 0 times out of 2147483647 +... Setting an EVAL scope, savestack=3 1 <f> < > | 3: EOS failed... failed... Match failed Freeing REx: `".*\\z"'

The matching isn't extended after the "\n". Whereas here

perl -D512 -e '$_ = "f\n";/.*\z/s' Compiling REx `.*\z' size 4 Got 36 bytes for offset annotations. first at 2 rarest char at 0 1: STAR(3) 2: SANY(0) 3: EOS(4) 4: END(0) floating ""$ at 0..2147483647 (checking floating) anchored(SBOL) impli +cit minlen 0 Offsets: [4] 2[1] 1[1] 3[2] 5[0] Omitting $` $& $' support. EXECUTING... Guessing start of match, REx ".*\z" against "f "... Found floating substr ""$ at offset 1... Guessed: match at offset 0 Matching REx ".*\z" against "f " Setting an EVAL scope, savestack=6 0 <> <f > | 1: STAR SANY can match 2 times out of 2147483647... Setting an EVAL scope, savestack=6 2 <f > <> | 3: EOS 2 <f > <> | 4: END Match successful! Freeing REx: `".*\\z"'

you can see that the '\z' (<> in the debug output) is found after the "\n":

Setting an EVAL scope, savestack=6 2 <f > <> | 3: EOS 2 <f > <> | 4: END

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Replies are listed 'Best First'.
Re^4: Strange regex to test for newlines: /.*\z/
by moritz (Cardinal) on May 21, 2007 at 15:10 UTC
    Sorry, I still don't get it.

    Obviously /\z/ matches the string "f\n", so why should it fail to match if I prepend it with something that matches the empty string? This should be independent of where the end of the string is considered to be.

    And why does /.?\z/ match and /.*\z/ not?

    If we expand that scheme, why does /.?.?\z/ match, and /.*.?\z/ not?

    In all cases I'd expect .? and .* to be reduced to the empty string - why doesn't it happen?

      Now I'm confused as well, my mental model doesn't seem to fit (at least not everywhere :-)

      Maybe demerphq could tell?

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

        avar confirms this bug still exists in blead, so ill give it a look when i get a chance. right cant say more right now as i injured my arm today and hacking one handed is no fun. (yes im fine, or so the xrays said)

        ---
        $world=~s/war/peace/g

Re^4: Strange regex to test for newlines: /.*\z/
by tinita (Parson) on May 21, 2007 at 14:53 UTC
    i personally would be interested in why the following happens:
    "\n" =~ /\n.*\z/; # matches "\n" =~ /.*\z/; # doesn't match. i would expect it to match "\n" =~ /[^\n]*\z/; # matches. like expected. but [\n]* is like .*
    /s or not /s doesn't have to do something with this, or at least it shouldn't, i think.
      "\n" =~ /\n.*\z/; # matches

      Obvious, I think. You match a "\n", then EOS (end of string).

      "\n" =~ /.*\z/; # doesn't match. i would expect it to match

      perl -D512 tells anchored(MBOL) (i.e. multiline beginning of line, see perldebguts) with that one, which anchoring doesn't happen with

      "\n" =~ /[^\n]*\z/; # matches. like expected. but [\n]* is like .*

      but why?

      "\n" =~ /.?\z/;

      matches, as does

      "\n" =~ /.{0,}\z/;

      I can't get a mental model of why the previous one should, but the next one should not match:

      "f\n" =~ /.?f\z/;

      Weird. Rather inconsistent, if not buggy.

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
        "\n" =~ /\n.*\z/; # matches
        Obvious, I think.

        If this one is obvious, then the original one should be obvious too. Compare:

        "\n" =~ /\n.*\z/; # matches "\n" =~ /.*\z/; # should match but doesn't

        The pattern gets shorter, but doesn't match any more...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://616569]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-03-29 13:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found