http://qs321.pair.com?node_id=62419


in reply to Perl is psychic?!

Excellent question!

Since everybody else seems to have missed your (subtle) point by quoting irrelevant documentation that you clearly understood in great detail, allow me to repeat your point. Perl is supposed to have an important optimization. If you never use $&, $`, and $' in your script, Perl is not supposed to calculate them ever. This is important because it makes matches against long strings an order of magnitude faster. If you use them ever, they are calculated from then on. Caveat programmer. (I don't use them, ever. I wish I could make attempting to use them optionally fatal just to smoke out people who use them, but I can't.)

With this optimization there should be no way that the above code will work since when you do the match, Perl is dealing with a script that has no $&, $`, or $' in it. And so when it goes to display the answer, the necessary data should not exist yet. But you run it and it does.

For the record I ran it under 5.004, and got the output that you describe. I ran it under 5.005 and got no output at all as you would expect. I ran it under a slightly modified 5.6 and got a segmentation fault. (Not good, but in this case understandable.) A slight modification of your code to test $' and $` had similar results. With 5.005 when I look at perldelta I see that there were a number of changes to the RE engine including the following:

Changes in Perl code using RE engine: More optimizations to s/longer/short/; study() was not working; /blah/ may be optimized to an analogue of index() i +f $& $` $' not seen; Unneeded copying of matched-against string removed; Only matched part of the string is copying if $` $' + were not seen;
The last 2 items sound like the behaviour fix. I guess that the optimization wasn't really being done in 5.004, or it was done but not done as fully as it was done later.

For the record I was seriously impressed with Ruby's optimization for this case. What they did is lazily calculated $&, $', and $` as needed. You only pay on the matches where you use those, or on cases where you try to modify a string in place that you matched against before you go to match again. Don't use it one place, pay no price even if you use it elsewhere. I tried, but couldn't find a way to break it. I suspect that this approach (which is much cleaner) would be harder to do in Perl. Still it was a nice surprise...

UPDATE
This seems to be very, very specific to the code. I actually assumed I knew what should happen and wanted to check $` and $' as well, so I changed the code to

'string' =~ /ri/; print eval <STDIN>;
for my tests. As confirmed on several platforms in chatter, the behaviour switches between versions of Perl. But the original code snippet always seems to work, and I have not a clue how or why.

Replies are listed 'Best First'.
Re: Re (tilly) 1: Perl is psychic?!
by pileswasp (Monk) on Mar 06, 2001 at 19:32 UTC
    Woo. This one's got me interested.

    I've tested this on perl 5.004_04 for sun-solaris, perls 5.004_05 and 5.6 for i686-linux (redhat) and even ActiveState's 5.6.0 for Win32 and _all_ of them show the same behaviour.
    What causes the difference between two variations on this bit of code is whether or not the pattern is plain text (as it says above /blah/ may be optimized to an analogue of index()). If there's no regex compilation then $& causes Segmentation faults.

    Using
    use re 'debug';
    shows that the regex isn't re-evaluated when the $& is entered on STDIN, but it does state explicitly Omitting $` $& $' support. Must say I'm at a bit of a loss as to where the value does come from.

    If I were to go out on a limb a bit I would say that I'm thinking that maybe the penalty from using $&, etc in your code is because perl links it into plain text matches as well as compiled regexes. ie $&, etc are always there for full compiled regex's, but index() doesn't normally return the pre-match, match and post-match strings, so the "analogue of index()" requires a bit more work to produce them.

    Where's japhy? I get the feeling he'll know :o)

    There's a bunch of tests and re 'debug' output below if you're interested: <READMORE>
    use re 'debug'; 'foo' =~ m/.*/; print eval <STDIN>;
    This gives the following output:
    Compiling REx `.*'
    size 3 first at 2
       1: STAR(3)
       2:   REG_ANY(0)
       3: END(0)
    anchored(MBOL) implicit minlen 0
    Omitting $` $& $' support.
    
    EXECUTING...
    
    Matching REx `.*' against `foo'
      Setting an EVAL scope, savestack=3
       0 <> <foo>             |  1:  STAR
                               REG_ANY can match 3 times out of 32767...
      Setting an EVAL scope, savestack=3
       3 <foo> <>             |  3:    END
    Match successful!
    
    Before waiting for the input. It actually specifies that it's omitting $&, etc support, yet when you do enter $& still gives the expected answer:
    Freeing REx: `.*'
    foo
    
    If you use a plain text match (like tilly suggested with /ri/ in 'string', you don't get this result at all, as perl doesn't handle the match in the same way, it "guesses" the result, presumably using a more index() like way of making the match:
    use re 'debug'; 'foo' =~ m/o/; print eval <STDIN>;
    gives the output:
    $ perl reg
    Compiling REx `o'
    size 3 first at 1
    rarest char o at 0
       1: EXACT <o>(3)
       3: END(0) 
    anchored `o' at 0 (checking anchored isall) minlen 1
    Omitting $` $& $' support.
    
    EXECUTING...
    
    Guessing start of match, REx `o' against `foo'...
    Found anchored substr `o' at offset 1...
    Guessed: match at offset 1
    $&
    Segmentation fault (core dumped)
    
    $` and $' don't have quite such drastic efects, they simply print blank.
    The extra level of compilation that look(ahead|behind)s give the regex also allow $& to produce the required result:
    use re 'debug'; 'foo' =~ m/(?<=f)o(?=o)/; print eval <STDIN>;
    Giving:
    $ perl reg
    Compiling REx `(?<=f)o(?=o)'
    size 15 first at 1
    rarest char o at 0
       1: IFMATCH[-1](7)
       3:   EXACT <f>(5)
       5:   SUCCEED(0)
       6:   TAIL(7)
       7: EXACT <o>(9)
       9: IFMATCH[-0](15)
      11:   EXACT <o>(13)
      13:   SUCCEED(0)
      14:   TAIL(15)
      15: END(0)
    anchored `o' at 0 (checking anchored) minlen 1
    Omitting $` $& $' support.
    
    EXECUTING...
    
    Guessing start of match, REx `(?<=f)o(?=o)' against `foo'...
    Found anchored substr `o' at offset 1...
    Guessed: match at offset 1
    Matching REx `(?<=f)o(?=o)' against `oo'
      Setting an EVAL scope, savestack=3
       1 <f> <oo>             |  1:  IFMATCH[-1]
       0 <> <foo>             |  3:    EXACT <f>
       1 <f> <oo>             |  5:    SUCCEED
                                  could match...
       1 <f> <oo>             |  7:  EXACT <o>
       2 <fo> <o>             |  9:  IFMATCH[-0]
       2 <fo> <o>             | 11:    EXACT <o>
       3 <foo> <>             | 13:    SUCCEED
                                  could match...
       2 <fo> <o>             | 15:  END
    Match successful!
    $&
    Freeing REx: `(?<=f)o(?=o)'
    o
    
(boo) Re (tilly) 1: Perl is psychic?!
by boo_radley (Parson) on Mar 06, 2001 at 12:27 UTC
    I'm curious to know if perl would attempt to re-execute the last regexp inside the eval block to get $&?
    Does that sound at all plausible? If so, would that mean that evaling on $&, $' or $` would remove their associated penalties?