http://qs321.pair.com?node_id=267270

Cirollo has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use a code evaluation assertion in a regex inside of a subroutine, and I'm seeing this strange behavior where it only executes the code the first time that I call the subroutine. Here is the simplest example I could come up with:
sub regex { my $in = shift; my $ret = ''; # Assign something to $ret in a code assertion $in =~ m/^(a)(?{$ret=1})/; return $ret; } while(<DATA>) { print regex($_) . "."; } __DATA__ a b a abcd bcda
The output that I get is "1...." when I would expect to get "1.1.1.1.1." As I understand it, the value 1 should get assigned to $ret every time through the regex, regardless of whether ^(a) matched. Can anyone explain this? I'm rather confused.

Replies are listed 'Best First'.
Re: Scoping issues with code evaluation asserstions?
by sauoq (Abbot) on Jun 19, 2003 at 17:05 UTC

    It seems you have inadvertently created a closure.

    If you use local $ret = ''; in that sub, things should work.

    Note, however, that by "work" I don't mean that it will return "1.1.1.1.1." It will only return 1 when the ^(a) portion of the regex matches. That's because, if it doesn't match, the regex engine returns right away and never gets to your code assertion.

    -sauoq
    "My two cents aren't worth a dime.";
    
      That was my first thought, but it didn't make much sense to me. Maybe my understanding of scoping isn't quite as good as I thought, but I thought that you had to declare a variable outside of the scope of the sub in order to create a closure.

      Could it be that the one-time compilation of the regex somehow causes the closure?

      Also, saying local $ret = ''; breaks under strict:

      Variable "$ret" is not imported at /home/apirkle/bin/retest.pl line 8. Global symbol "$ret" requires explicit package name at /home/apirkle/b +in/retest.pl line 6. Global symbol "$ret" requires explicit package name at /home/apirkle/b +in/retest.pl line 8.

        You'd need to have declared $ret as a global somewhere for local to localize it. Remember that local() declares nothing.

        use vars '$ret'; sub regex { local $ret = '';

        What you didn't realize was that the (?{ ... }) is a closure so when it first ran it captured the original $ret and continued to write to that same instance which wasn't the same one being returned in later iterations.

        Could it be that the one-time compilation of the regex somehow causes the closure?

        That's exactly what it is. Your regex() sub isn't the closure, the code in the regular expression code assertion is.

        Also, saying local $ret = ''; breaks under strict:

        Well of course it does. :-) (I should have mentioned that though.) Use our $ret = ''; or declare $ret with use vars qw( $ret ); instead. And keep in mind that whichever way you choose to declare it, it's still a global variable.

        -sauoq
        "My two cents aren't worth a dime.";
        
Re: Scoping issues with code evaluation asserstions?
by pernod (Chaplain) on Jun 20, 2003 at 09:48 UTC

    As mentioned in other posts in this thread, embedded code constructs create closures. Jeffrey Friedl's 'Mastering Regular Expressions' talks about this on page 338 in 'A Warning About Embedded Code and my Variables'.

    As diotalevi points out, only the instance of the variable seen at compile time is ever used. I tried to force recompilation of the regex by using string interpolation, but this was rejected by the compiler with the message:

    Eval-group not allowed at runtime, use re 'eval' in regex m/^(a)(?{$placeholder = 1})/ at embed.pl line 12, <DATA> line 1.

    I tried to use the re 'eval' pragma too, but then the pragma complained about modifying constant items in variable assignments.

    A solution to your problem could be to use the aliasing properties of @_ (as described in perlsub) instead of explicit return values. This is just another global variable trick, though, but here goes:

    #! /usr/bin/perl use strict; use warnings; sub regex { my ( $in ) = @_; $in =~ m/^(a)(?{$_[ 1 ] = 1})/; } while(<DATA>) { my $placeholder; &regex( $_, $placeholder ); print $placeholder . "." if $placeholder; } __DATA__ a b a abcd bcda

    This returns 1.1.1., so at least it does what you asked it for.

    pernod
    --
    Mischief. Mayhem. Soap.
Re: Scoping issues with code evaluation asserstions?
by aquarium (Curate) on Jun 19, 2003 at 22:49 UTC
    from my understanding: variables in code sections of regexes only have scope within the regex and dissappear thereafter. The camel book was mentions of a way to pass values of code regex variables out of the regex. can't remember how off the top of my head. i think the "ret" should be declared as local inside the regex. check in the camel book. beats the hell out of me why you would get a "1" result though. good luck.