Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Lexical %+ %- and more?

by blazar (Canon)
on Oct 22, 2008 at 10:35 UTC ( [id://718691]=perlmeditation: print w/replies, xml ) Need Help??

This is basically a repost of the last part of a reply of mine which appearently went unnoticed and anyway didn't get any answer: indeed it went slightly OT WRT the main thread.

So we have named captures and it's not much time since we do, but I already ask for them not to be read-only: what people tell me basically is that there are implementation details that force them to be instead. Now I go one step further so this may well be taken as sci-fi, but nevertheless I think it exposes an interesting idea: from 5.10 onward we have a lexical $_ so I wonder whether we could have lexical %+ and %- such that:

  • their entries would not be reset across matches until the end of the lexical scope they live in;
  • (they would be modifiable - I still insist on that!)

Thus one may have the following example (which explains the whole thing better than many abstract descriptions...) working as naively expected:

{ my %+; doit if $x ~~ / (?<x1> \w+)\s+(?<x2> \w+) /x and $y ~~ / (?<y1> \w+)\s+(?<y2> \w+) /x and $+{x1} . $+{y2} eq $+{y1} . $+{x2}; }

Assuming e.g.:

$x = 'fo ar!'; $y = '?foob obar ar';
at the end of the scope, if I printed Data::Dumper's Dumper \%+ I would get

$VAR1 = { 'y1' => 'foob', 'x2' => 'ar', 'y2' => 'obar', 'x1' => 'fo' };

Please don't point out that wrt the example above there are tons of other WTDI: it's obvious that there are - we're talking about Perl anyway! I just think we could have one more, and with a very clear syntax too. Also, the idea sprang in the context of that other thread dealing with %+ and %- but there may be other special variables that may allow a lexical incarnation with a modified semantics associated to to it.

--
If you can't understand the incipit, then please check the IPB Campaign.

Replies are listed 'Best First'.
Re: Lexical %+ %- and more?
by TimToady (Parson) on Oct 22, 2008 at 17:56 UTC
    The original design of @+ and @- was a complete botch, and 5.10 extends that botch to the use of hashes. Perl 5 should move toward the Perl 6 model of a single lexical variable containing all the information from the last match, and then any variables like $1 are just aliases into that structure. Parallel global arrays and hashes are madness, even if I could keep straight which one is the beginning and which one is the end, which I can't. And parallel hashes force you to do the hash lookup twice. Madness...
Re: Lexical %+ %- and more?
by JavaFan (Canon) on Oct 22, 2008 at 11:41 UTC
    my %+; my %-; 'foo' =~ /(?<w>\w+)/; 'bar' =~ /(?<w>\w+)/; use YAML; print Dump \%+; print Dump \%-; __END__
    What should that print? What if the last match was '--' =~ /(?<\w>\w+)/? What if %+ is lexical, but %- isn't?

    And if lexical %- and %+ works as you want, should this work as well?

    'foo' =~ /(?<w>\w+)/ && 'foofoo' =~ /\g{w}\g{w}/;
    But that begs the question, what about:
    'oo' =~ /(?<w>\w+)/ && 'oo' =~ /\g{w}\g{w}/;
      What should that print?

      I personally believe:

      --- w: bar --- w: - bar - bar

      The same "variable" is used and thus it is natural for it to be clobbered: if I didn't want, then I would have used a different one, especially since "now" it is so easy, whereas it wouldn't be an option were it only for numbered captures.

      What if the last match was '--' =~ /(?<\w>\w+)/?

      I beg your pardon, but... I don't see the difference! Maybe I'm just tired...

      What if %+ is lexical, but %- isn't?

      Well, they should behave independently, although of course this would be very inconsistent if one need both. (But I bet some hacker would find a cool way to exploit it for something weird and insane! ;)

      And if lexical %- and %+ works as you want, should this work as well?
      'foo' =~ /(?<w>\w+)/ && 'foofoo' =~ /\g{w}\g{w}/;

      I don't see any reason why it shouldn't.

      But that begs the question, what about:
      'oo' =~ /(?<w>\w+)/ && 'oo' =~ /\g{w}\g{w}/;

      Well, this should plainly fail. I think you're asking me what should be of %+ and %- after this, right? Well: no named captures are attempted in the second match, so they should stay like:

      %+ = ( w => 'oo'); %- = ( w => ['oo']);

      But if it were

      'oo' =~ /(?<w>\w+)/ && 'oo' =~ /(?<w>\g{w}\g{w})/; # Which I *think* +is possible!

      then they would become

      %+ = ( w => undef); # or not existing at all? I'm half hearted... %- = ( w => []);
      --
      If you can't understand the incipit, then please check the IPB Campaign.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://718691]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2024-04-25 15:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found