in reply to Re^2: How to know that a regexp matched, and get its capture groups? in thread How to know that a regexp matched, and get its capture groups?
update
I don't recommend the following code anymore
rather
if ( my (@caps) = ($line =~ $re) ) {
no warnings 'uninitialized';
@caps = () if $caps[0] ne $1; # reset pseudo capture
+s
$cb->(@caps);
last;
}
/update
This should be backward compatible
my (@matches) = ($line =~ $re)
if (defined $&) {
$cb->(@matches);
last;
}
# tests...
use v5.12;
use warnings;
for my $str ("AB","") {
say "****** str=<$str>";
for my $re ( qr/../, qr/(.)(.)/, q/XY/, q/(X)Y/, q// ) {
say "--- re=<$re>";
my @captures = $str =~ $re;
if ( defined $& ) {
say "matched"
} else {
say "no match"
}
if (defined $1) {
say "with captures <@captures>";
} else {
say "no captures";
}
}
}
****** str=<AB>
--- re=<(?^u:..)>
matched
no captures
--- re=<(?^u:(.)(.))>
matched
with captures <A B>
--- re=<XY>
no match
no captures
--- re=<(X)Y>
no match
no captures
--- re=<>
matched
no captures
****** str=<>
--- re=<(?^u:..)>
no match
no captures
--- re=<(?^u:(.)(.))>
no match
no captures
--- re=<XY>
no match
no captures
--- re=<(X)Y>
no match
no captures
--- re=<>
matched
no captures
Re^4: How to know that a regexp matched, and get its capture groups?
by choroba (Cardinal) on Jan 10, 2023 at 12:17 UTC
|
But...
> See "Performance issues" above for the serious performance implications of using this variable (even once) in your code.
perlvar
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] |
|
OK, I thought this performance issue was resolved for versions newer than 10-15 years.
Anyway
${^MATCH}
This is similar to $& ($MATCH) except that it does not incur the perfo
+rmance penalty associated with that variable.
This variable was added in Perl v5.10.0.
though I don't understand the /p comment
| [reply] [d/l] |
|
If you want ${^MATCH} to work in pre-5.20, you need to use /p, which imposes the performance penalty only on the marked regex. On 5.20+, /p is a noop and ${^MATCH} works always, plus all the performance penalties are gone (not tested by me).
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] |
Re^4: How to know that a regexp matched, and get its capture groups?
by LanX (Saint) on Jan 10, 2023 at 14:42 UTC
|
>
my (@matches) = ($line =~ $re)
if (defined $&) {
$cb->(@matches);
last;
}
nope, the real problem is that @matches = (1) if it matched without capture groups in $re
new attempt:
if ( my (@caps) = ($line =~ $re) ) {
@caps = () if $caps[0] ne $1; # reset pseudo capture
+s
$cb->(@caps);
last;
}
Full backwards compatible and no performance penalty.
OK?
| [reply] [d/l] [select] |
|
if ( my (@caps) = ($line =~ $re) ) {
@caps = () if $caps[0] ne $1; # reset pseudo captures
$cb->(@caps);
last;
}
In
@caps = () if $caps[0] ne $1;
$1 will be undefined if there is no capture group 1 or it is something like (...)? that fails to match
(but an overall match can still occur). Warnings will be generated.
Although it's only compatible back to version 5.6, an alternative would be
c:\@Work\Perl\monks>perl
use strict;
use warnings;
use Data::Dump qw(pp);
my $cb = sub { print "matched, captured ", pp @_; };
for my $str ("AB","") {
for my $re (qr/../, qr/(.)(.)/, qr/(X)?(.)/, qr/XY/, qr/(X)Y/, qr/
+/) {
print "str <$str> re $re ";
if ( my (@caps) = ($str =~ $re) ) {
# @caps = () if $caps[0] ne $1; # reset pseudo capture
$#caps = $#- - 1; # reset pseudo capture
$cb->(@caps);
}
else {
print 'NO match';
}
print "\n";
}
}
^Z
str <AB> re (?-xism:..) matched, captured ()
str <AB> re (?-xism:(.)(.)) matched, captured ("A", "B")
str <AB> re (?-xism:(X)?(.)) matched, captured (undef, "A")
str <AB> re (?-xism:XY) NO match
str <AB> re (?-xism:(X)Y) NO match
str <AB> re (?-xism:) matched, captured ()
str <> re (?-xism:..) NO match
str <> re (?-xism:(.)(.)) NO match
str <> re (?-xism:(X)?(.)) NO match
str <> re (?-xism:XY) NO match
str <> re (?-xism:(X)Y) NO match
str <> re (?-xism:) matched, captured ()
Update: But see haukex's reply about the preference for $#+ versus $#- (@+ versus @-) regex special variables.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
$#caps = $#- - 1;
As I linked to in my update here, there is a subtle but important distinction between $#- and $#+. You can see this in action if you add e.g. qr/A(X)?/ as one of the test regexes, with $#+ - 1 the resulting @caps will be (undef) instead of being empty, thus correctly reflecting the number of capture groups present in the regex. Yet another solution might be $cb->($#+ ? @caps : ()).
| [reply] [d/l] [select] |
|
use v5.12;
use warnings;
use Data::Dump qw(pp);
my $cb = sub { say "matched, captured ", pp @_; };
for my $str ("AB","") {
for my $re (qr/../, qr/(.)(.)/, qr/(X)?(.)/, qr/XY/, qr/(X)Y/, qr/
+/) {
say "--- str <$str> re $re ";
if ( my (@caps_0) = ($str =~ $re) ) {
my @caps = @caps_0;
$#caps = $#- - 1; # reset pseudo capture
$cb->(@caps);
@caps = @caps_0;
no warnings 'uninitialized';
@caps = () if $caps[0] ne $1; # reset pseudo capture
$cb->(@caps);
} else {
say 'NO match';
}
say "\n";
}
}
--- str <AB> re (?^u:..)
matched, captured ()
matched, captured ()
--- str <AB> re (?^u:(.)(.))
matched, captured ("A", "B")
matched, captured ("A", "B")
--- str <AB> re (?^u:(X)?(.))
matched, captured (undef, "A")
matched, captured (undef, "A")
--- str <AB> re (?^u:XY)
NO match
--- str <AB> re (?^u:(X)Y)
NO match
--- str <AB> re (?^u:)
matched, captured ()
matched, captured ()
--- str <> re (?^u:..)
NO match
--- str <> re (?^u:(.)(.))
NO match
--- str <> re (?^u:(X)?(.))
NO match
--- str <> re (?^u:XY)
NO match
--- str <> re (?^u:(X)Y)
NO match
--- str <> re (?^u:)
matched, captured ()
matched, captured ()
| [reply] [d/l] [select] |
|
|