Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^3: How to know that a regexp matched, and get its capture groups?

by LanX (Saint)
on Jan 10, 2023 at 12:10 UTC ( [id://11149481]=note: print w/replies, xml ) Need Help??


in reply to Re^2: How to know that a regexp matched, and get its capture groups?
in thread How to know that a regexp matched, and get its capture groups?

update

I don't recommend the following code anymore

rather

if ( my (@caps) = ($line =~ $re) ) { no warnings 'uninitialized'; @caps = () if $caps[0] ne $1; # reset pseudo capture +s $cb->(@caps); last; }
/update

This should be backward compatible

my (@matches) = ($line =~ $re) if (defined $&) { $cb->(@matches); last; }

# tests...

use v5.12; use warnings; for my $str ("AB","") { say "****** str=<$str>"; for my $re ( qr/../, qr/(.)(.)/, q/XY/, q/(X)Y/, q// ) { say "--- re=<$re>"; my @captures = $str =~ $re; if ( defined $& ) { say "matched" } else { say "no match" } if (defined $1) { say "with captures <@captures>"; } else { say "no captures"; } } }

****** str=<AB> --- re=<(?^u:..)> matched no captures --- re=<(?^u:(.)(.))> matched with captures <A B> --- re=<XY> no match no captures --- re=<(X)Y> no match no captures --- re=<> matched no captures ****** str=<> --- re=<(?^u:..)> no match no captures --- re=<(?^u:(.)(.))> no match no captures --- re=<XY> no match no captures --- re=<(X)Y> no match no captures --- re=<> matched no captures

Cheers Rolf
(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^4: How to know that a regexp matched, and get its capture groups?
by choroba (Cardinal) on Jan 10, 2023 at 12:17 UTC
    But...

    > See "Performance issues" above for the serious performance implications of using this variable (even once) in your code.

    perlvar

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      OK, I thought this performance issue was resolved for versions newer than 10-15 years.

      Anyway

      ${^MATCH} This is similar to $& ($MATCH) except that it does not incur the perfo +rmance penalty associated with that variable. This variable was added in Perl v5.10.0.

      though I don't understand the /p comment

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

        If you want ${^MATCH} to work in pre-5.20, you need to use /p, which imposes the performance penalty only on the marked regex. On 5.20+, /p is a noop and ${^MATCH} works always, plus all the performance penalties are gone (not tested by me).

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re^4: How to know that a regexp matched, and get its capture groups?
by LanX (Saint) on Jan 10, 2023 at 14:42 UTC
    >
    my (@matches) = ($line =~ $re) if (defined $&) { $cb->(@matches); last; }

    nope, the real problem is that @matches = (1) if it matched without capture groups in $re

    new attempt:

    if ( my (@caps) = ($line =~ $re) ) { @caps = () if $caps[0] ne $1; # reset pseudo capture +s $cb->(@caps); last; }

    Full backwards compatible and no performance penalty.

    OK?

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

      if ( my (@caps) = ($line =~ $re) ) { @caps = () if $caps[0] ne $1; # reset pseudo captures $cb->(@caps); last; }

      In
          @caps = () if $caps[0] ne $1;
      $1 will be undefined if there is no capture group 1 or it is something like (...)? that fails to match (but an overall match can still occur). Warnings will be generated.

      Although it's only compatible back to version 5.6, an alternative would be

      c:\@Work\Perl\monks>perl use strict; use warnings; use Data::Dump qw(pp); my $cb = sub { print "matched, captured ", pp @_; }; for my $str ("AB","") { for my $re (qr/../, qr/(.)(.)/, qr/(X)?(.)/, qr/XY/, qr/(X)Y/, qr/ +/) { print "str <$str> re $re "; if ( my (@caps) = ($str =~ $re) ) { # @caps = () if $caps[0] ne $1; # reset pseudo capture $#caps = $#- - 1; # reset pseudo capture $cb->(@caps); } else { print 'NO match'; } print "\n"; } } ^Z str <AB> re (?-xism:..) matched, captured () str <AB> re (?-xism:(.)(.)) matched, captured ("A", "B") str <AB> re (?-xism:(X)?(.)) matched, captured (undef, "A") str <AB> re (?-xism:XY) NO match str <AB> re (?-xism:(X)Y) NO match str <AB> re (?-xism:) matched, captured () str <> re (?-xism:..) NO match str <> re (?-xism:(.)(.)) NO match str <> re (?-xism:(X)?(.)) NO match str <> re (?-xism:XY) NO match str <> re (?-xism:(X)Y) NO match str <> re (?-xism:) matched, captured ()

      Update: But see haukex's reply about the preference for $#+ versus $#- (@+ versus @-) regex special variables.


      Give a man a fish:  <%-{-{-{-<

        $#caps = $#- - 1;

        As I linked to in my update here, there is a subtle but important distinction between $#- and $#+. You can see this in action if you add e.g. qr/A(X)?/ as one of the test regexes, with $#+ - 1 the resulting @caps will be (undef) instead of being empty, thus correctly reflecting the number of capture groups present in the regex. Yet another solution might be $cb->($#+ ? @caps : ()).

        > $1 will be undefined if there is no capture group 1

        I was aware of that and my results are exactly like yours.

        In the following I just need to silence the warnings about undef values.

        use v5.12; use warnings; use Data::Dump qw(pp); my $cb = sub { say "matched, captured ", pp @_; }; for my $str ("AB","") { for my $re (qr/../, qr/(.)(.)/, qr/(X)?(.)/, qr/XY/, qr/(X)Y/, qr/ +/) { say "--- str <$str> re $re "; if ( my (@caps_0) = ($str =~ $re) ) { my @caps = @caps_0; $#caps = $#- - 1; # reset pseudo capture $cb->(@caps); @caps = @caps_0; no warnings 'uninitialized'; @caps = () if $caps[0] ne $1; # reset pseudo capture $cb->(@caps); } else { say 'NO match'; } say "\n"; } }

        --- str <AB> re (?^u:..) matched, captured () matched, captured () --- str <AB> re (?^u:(.)(.)) matched, captured ("A", "B") matched, captured ("A", "B") --- str <AB> re (?^u:(X)?(.)) matched, captured (undef, "A") matched, captured (undef, "A") --- str <AB> re (?^u:XY) NO match --- str <AB> re (?^u:(X)Y) NO match --- str <AB> re (?^u:) matched, captured () matched, captured () --- str <> re (?^u:..) NO match --- str <> re (?^u:(.)(.)) NO match --- str <> re (?^u:(X)?(.)) NO match --- str <> re (?^u:XY) NO match --- str <> re (?^u:(X)Y) NO match --- str <> re (?^u:) matched, captured () matched, captured ()

        Cheers Rolf
        (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11149481]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-04-19 13:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found