Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^4: How to know that a regexp matched, and get its capture groups?

by LanX (Saint)
on Jan 10, 2023 at 14:42 UTC ( [id://11149491]=note: print w/replies, xml ) Need Help??


in reply to Re^3: How to know that a regexp matched, and get its capture groups?
in thread How to know that a regexp matched, and get its capture groups?

>
my (@matches) = ($line =~ $re) if (defined $&) { $cb->(@matches); last; }

nope, the real problem is that @matches = (1) if it matched without capture groups in $re

new attempt:

if ( my (@caps) = ($line =~ $re) ) { @caps = () if $caps[0] ne $1; # reset pseudo capture +s $cb->(@caps); last; }

Full backwards compatible and no performance penalty.

OK?

Cheers Rolf
(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^5: How to know that a regexp matched, and get its capture groups?
by AnomalousMonk (Archbishop) on Jan 11, 2023 at 06:22 UTC
    if ( my (@caps) = ($line =~ $re) ) { @caps = () if $caps[0] ne $1; # reset pseudo captures $cb->(@caps); last; }

    In
        @caps = () if $caps[0] ne $1;
    $1 will be undefined if there is no capture group 1 or it is something like (...)? that fails to match (but an overall match can still occur). Warnings will be generated.

    Although it's only compatible back to version 5.6, an alternative would be

    c:\@Work\Perl\monks>perl use strict; use warnings; use Data::Dump qw(pp); my $cb = sub { print "matched, captured ", pp @_; }; for my $str ("AB","") { for my $re (qr/../, qr/(.)(.)/, qr/(X)?(.)/, qr/XY/, qr/(X)Y/, qr/ +/) { print "str <$str> re $re "; if ( my (@caps) = ($str =~ $re) ) { # @caps = () if $caps[0] ne $1; # reset pseudo capture $#caps = $#- - 1; # reset pseudo capture $cb->(@caps); } else { print 'NO match'; } print "\n"; } } ^Z str <AB> re (?-xism:..) matched, captured () str <AB> re (?-xism:(.)(.)) matched, captured ("A", "B") str <AB> re (?-xism:(X)?(.)) matched, captured (undef, "A") str <AB> re (?-xism:XY) NO match str <AB> re (?-xism:(X)Y) NO match str <AB> re (?-xism:) matched, captured () str <> re (?-xism:..) NO match str <> re (?-xism:(.)(.)) NO match str <> re (?-xism:(X)?(.)) NO match str <> re (?-xism:XY) NO match str <> re (?-xism:(X)Y) NO match str <> re (?-xism:) matched, captured ()

    Update: But see haukex's reply about the preference for $#+ versus $#- (@+ versus @-) regex special variables.


    Give a man a fish:  <%-{-{-{-<

      $#caps = $#- - 1;

      As I linked to in my update here, there is a subtle but important distinction between $#- and $#+. You can see this in action if you add e.g. qr/A(X)?/ as one of the test regexes, with $#+ - 1 the resulting @caps will be (undef) instead of being empty, thus correctly reflecting the number of capture groups present in the regex. Yet another solution might be $cb->($#+ ? @caps : ()).

      > $1 will be undefined if there is no capture group 1

      I was aware of that and my results are exactly like yours.

      In the following I just need to silence the warnings about undef values.

      use v5.12; use warnings; use Data::Dump qw(pp); my $cb = sub { say "matched, captured ", pp @_; }; for my $str ("AB","") { for my $re (qr/../, qr/(.)(.)/, qr/(X)?(.)/, qr/XY/, qr/(X)Y/, qr/ +/) { say "--- str <$str> re $re "; if ( my (@caps_0) = ($str =~ $re) ) { my @caps = @caps_0; $#caps = $#- - 1; # reset pseudo capture $cb->(@caps); @caps = @caps_0; no warnings 'uninitialized'; @caps = () if $caps[0] ne $1; # reset pseudo capture $cb->(@caps); } else { say 'NO match'; } say "\n"; } }

      --- str <AB> re (?^u:..) matched, captured () matched, captured () --- str <AB> re (?^u:(.)(.)) matched, captured ("A", "B") matched, captured ("A", "B") --- str <AB> re (?^u:(X)?(.)) matched, captured (undef, "A") matched, captured (undef, "A") --- str <AB> re (?^u:XY) NO match --- str <AB> re (?^u:(X)Y) NO match --- str <AB> re (?^u:) matched, captured () matched, captured () --- str <> re (?^u:..) NO match --- str <> re (?^u:(.)(.)) NO match --- str <> re (?^u:(X)?(.)) NO match --- str <> re (?^u:XY) NO match --- str <> re (?^u:(X)Y) NO match --- str <> re (?^u:) matched, captured () matched, captured ()

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11149491]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-04-18 08:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found