Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

why is $1 cleared at end of an inline sub?

by perl-diddler (Chaplain)
on Sep 16, 2021 at 12:26 UTC ( [id://11136826]=perlquestion: print w/replies, xml ) Need Help??

perl-diddler has asked for the wisdom of the Perl Monks concerning the following question:

Have a quick question, the answer for which seems to be obvious, but just wanting to check and maybe hope I'm wrong.

#!/usr/bin/perl use strict; use warnings; use P; my $intxt; $intxt = << 'TXT' ; <package type="rpm"> <name>7kaa-music</name> <url>http://7kfans.com/</url> TXT ; sub REindex($$;$) { #like 'index', except substr is RE my ($str,$ss)=(shift, shift); my $p = @_ ? shift:0; $str =~ m{^.{$p,$p}($ss+)} ? length $1 : -1; } my @lines=split "\n", $intxt; my $ln; my $lineno=0; sub getln() { return $lineno<@lines ? $lines[$lineno++] : undef; } my $ttag; sub getnxt_tagln(); local * getnxt_tagln; *getnxt_tagln = sub () { do { $_=getln(); defined $_ or return undef; } until m{^\s*<(/?\w+)}; $ttag=$1; }; my $tag; NXTPKG: while (getnxt_tagln()) { # why '$1' null? $ln = $_; $tag = $1; Pe "_=%s, ttag=%s, tag=%s", $_, $ttag, $tag; } # vim: ts=2 sw=2 ai number
My question concerns the comment after the NXTPKG line: why '$1' null (∄)?

When I run this:

_=<package type="rpm">, ttag=package, tag=∄;
_=  <name>7kaa-music</name>, ttag=name, tag=∄;
_=  <url>http://7kfans.com/</url>, ttag=url, tag=∄;

tag is null/undef when I get out of my inline-sub. I can get around it by assigning $1 to $ttag, but I don't have any other Regex's that should be clearing '$1'. Seems a bit weird to have the end of a local sub clear '$1', yet that seems to be what is happening. Why? What was the logic of forcing/doing that?

tnx!

Replies are listed 'Best First'.
Re: why is $1 cleared at end of an inline sub?
by haukex (Archbishop) on Sep 16, 2021 at 12:38 UTC

    As documented in Variables related to regular expressions and lower down for $1:

    These variables are read-only and dynamically-scoped.

    The dynamic scoping (see also Temporary Values via local()) makes sense because it means code like this still works:

    sub do_something_else { my $foo = "quzbaz"; $foo =~ /([aeiou]+)/ and print ">$1\n"; # prints ">u" } my $bar = "foobar"; if ( $bar =~ /([aeiou]+)/ ) { do_something_else(); print "$1\n"; # still prints "oo", not "u" }

    As a general rule, regular expression variables that you want to keep for later use should be copied into other variables ASAP, and only if the match was successful.

    Update: Better link instead of local

      Both of these answers ignored that the sub was an 'anonymous'/'inline' sub that would have access to surrounding local variables in the same scope, including regex vars.

      I find it surprising that '$1' is affected in this way by an anonymous sub.

      I wouldn't find it surprising that a normal sub would auto-save context to not completely disrupt callers (even though '$_' needs to be explicitly saved with local).

      To re-ask, why is an inline-sub which I thought was designed to have access to local vars (in same context) restoring '$1'. If it was accessing or changing '$_1', it would access the copy of the sub it was in. I had supposed that the '$1' would stay constant until another regex and that an inline/anon sub wouldn't treat '$1' differently from '$_1'.

      I was really more wondering what the rational might be for treating them differently in an anon/inline sub.

      In the same way I find that '$1', and '$2' are cleared coming out of a 'do' block to be strange -- I would have thought only another regex would change them.

      my $s="abcdefg";
      $_=$s;
      my @res=do { m{abc(de)(fg)}; };
      P "nres=%s 1=%s, 2=%s", 0+@res, $1, $2;
      '
      nres=2 1=∄, 2=∄
      
        Both of these answers ignored that the sub was an 'anonymous'/'inline' sub

        I "ignored" it because it makes no difference, a sub is a sub no matter whether it has an entry in the symbol table or not. (Update: There are small differences, e.g. how a sub call is parsed depending on when the compiler sees the definition, but that's not relevant to this thread.)

        ... that would have access to surrounding local variables in the same scope, including regex vars.

        Sorry, but that's not how dynamic scoping works. It might help to forget about lexical scope entirely for a moment, and to think of it in terms of the call stack: it's like local stores the current value of the variable onto a stack, and exiting the currently executing scope (sub, do, etc.) restores the saved value. This happens during runtime, hence the "dynamic". Also note that dynamic scoping only works for package variables, not lexicals (my).

        I had supposed that the '$1' would stay constant until another regex... I would have thought only another regex would change them.

        Yes, the implicit dynamic scoping can be little surprising like that, but once you get the hang of dynamic scoping, it should make sense. I showed with my example above why it makes sense to do it that way for regex variables.

        "normal subs" are just named "anonymous subs", there is not much more difference.

        consider

        DB<7> *beyonce = sub { print "say my name, say my name" } DB<8> beyonce() say my name, say my name DB<9>

        this also works the other way round, you can read the sub-ref of a named sub and than destroy the name in the packages STASH:

        DB<21> sub kelly { print "say my name, say my name" } DB<22> $anosub = \&kelly DB<23> delete $main::{kelly} DB<24> $anosub->() say my name, say my name DB<25> kelly() Undefined subroutine &main::kelly called at (eval 34)[c:/Strawberry/pe +rl/lib/perl5db.pl:738] line 2.

        So where do you want to draw the line???

        side-note

        there are though block-compounds in Perl which can be confused with anonymous subs.

        Maybe that's your misunderstanding, if you talk about "inlined subs" °?

        for instance map-blocks are not ano-subs effecting return

        DB<19> sub tst { map { return $_ } 42..1e6 ; return "never executed" + } DB<20> p tst() 42 DB<21>

        But those map-like constructs in List::Util are implemented with ano-subs and won't allow returning from outer subs!

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        °) what does that even mean?

        The linked documentation explains it. Dynamic scope propagates inside blocks, but not outside.
        my $s = 'abcdefg'; $s =~ m{abc(de)(fg)}; my $output = do { sprintf "1=%s, 2=%s", $1, $2 }; print $output; # 1=de, 2=fg

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        maybe haukex understood your problem and you need to get acquainted with dynamic-scoping ... which is the only possible way to have limited control over global variables.

        NB: our package-vars and special vars like $1 are global. They are accessible everywhere at run-time and prone to "sabotage".

        Static aka lexical scoping is a totally different beast for my vars at compile-time.

        Try to debug a global variable which suddenly changes after you called a sub from a foreign module you just upgraded.

        And special vars are not protected by namespaces, they are all in main:: !

        That's why they are automatically localized in subs.

        Dynamic scoping was already a given in Perl4, which had no such thing like my or lexical scoping.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: why is $1 cleared at end of an inline sub?
by LanX (Saint) on Sep 16, 2021 at 12:42 UTC
    TL;DR all

    next time please condense it to the relevant part!

    > Seems a bit weird to have the end of a local sub clear '$1', yet that seems to be what is happening

    yes, easily shown in a SSCCE

    DB<3> sub bla { "XXX"=~/(X*)/; print "inside $1" } DB<4> bla; print "outside $1" inside XXXoutside DB<5>

    > What was the logic of forcing/doing that?

    I'd say it's about localizing the inner sub to protect all caller levels from effects at a distance, consider

    DB<5> "YYY"=~/(Y*)/; bla; print "old $1" inside XXXold YYY DB<6>

    otherwise nobody could rely on $1 etc anymore after calling a random sub.

    Using a dedicated closure var holding the copied content of $1 is the way to go in your use case.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11136826]
Front-paged by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-04-19 13:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found