Extracting the number of repetitions from a regex

pat_mc has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Extracting the number of repetitions from a regex by BrowserUk (Patriarch) on Dec 18, 2008 at 16:59 UTC
Lexicals don't work (right) in regex code blocks. If you switch to using globals, you'll get the results you are after: my @strings = qw( aaabbbb ab abb aabb aaabb aabbb ); for my $string ( @strings ) { our $a_counter = 0; our $b_counter = 0; print "In $string there were $a_counter 'a's and $b_counter 'b's. +\n" if $string =~ /(a(?{$a_counter ++;}))+(b(?{$b_counter ++;}))+/; } __END__ C:\test>junk In aaabbbb there were 3 'a's and 4 'b's. In ab there were 1 'a's and 1 'b's. In abb there were 1 'a's and 2 'b's. In aabb there were 2 'a's and 2 'b's. In aaabb there were 3 'a's and 2 'b's. In aabbb there were 2 'a's and 3 'b's. [download] There is a warning about this in one of the regex pods. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^2: Extracting the number of repetitions from a regex by pat_mc (Pilgrim) on Dec 18, 2008 at 20:40 UTC
Cool stuff, BrowserUk! This really does the trick. Since I am not really clear on the difference between 'Lexicals' and 'Globals' - could you please briefly expand on how the change from declaring the variables with `my` to `our` affects this scenario? I expected the scope of a variable definition to be that of the block is was defined in ... hence my confusion as to why the use of globals would make any difference here. Thanks again - Pat	[reply] [d/l] [select]
Re^3: Extracting the number of repetitions from a regex by ikegami (Patriarch) on Dec 18, 2008 at 21:03 UTC
could you please briefly expand on how the change from declaring the variables with my to our affects this scenario? Briefly? Code blocks in regexp patterns capture lexical variabless when they are compiled, just like anonymous subs. Package variables aren't captured. In case that didn't do the trick, the longer answer follows. Code blocks in regexps are anonymous subs. `sub f { my ($x) = @_; '' =~ /(?{ print "$x\n" })/; } f(4); # 4 f(5); # 4!!` [download] effectively does `sub f { my ($x) = @_; $block \|\|= sub { print "$x\n" }; $block->(); } f(4); # 4 f(5); # 4!!` [download] The $x from the first pass is captured when the sub is compiled. It's a very powerful feature which allows the simplification of many problems. For example, `BEGIN { package Prefixer; sub new { my ($class, $prefix) = @_; return bless({ prefix => $prefix }, $class); } sub prefix { my ($self) = @_; return join '', $self->{prefix}, @_; } } my $a_prefixer = Prefixer->new('a'); my $b_prefixer = Prefixer->new('b'); print("$_\n") for $a_prefixer->prefix('1'), # a1 $a_prefixer->prefix('2'), # a2 $b_prefixer->prefix('3'), # b3 $b_prefixer->prefix('4'); # b4` [download] can be simplified to `sub make_prefixer { my ($prefix) = @_; return sub { return join '', $prefix, @_ }; } my $a_prefixer = make_prefixer('a'); my $b_prefixer = make_prefixer('b'); print("$_\n") for $a_prefixer->('1'), # a1 $a_prefixer->('2'), # a2 $b_prefixer->('3'), # b3 $b_prefixer->('4'); # b4` [download] However, subs only capture lexical variables, not package variables. By using package variables, the problem goes away. `sub f { local our ($x) = @_; $block \|\|= sub { print "$x\n" }; $block->(); } f(4); # 4 f(5); # 5` [download] `sub f { local our ($x) = @_; '' =~ /(?{ print "$x\n" })/; } f(4); # 4 f(5); # 5` [download]	[reply] [d/l] [select]
Re^4: Extracting the number of repetitions from a regex by pat_mc (Pilgrim) on Dec 19, 2008 at 10:56 UTC
Re: Extracting the number of repetitions from a regex by Roy Johnson (Monsignor) on Dec 18, 2008 at 17:24 UTC
It's a scoping issue. Apparently, the regex uses the variables that existed when it was first compiled, while you have declared a new set of variables on every pass. Or something like that. Anyway, this works: `my @strings = qw( aaabbbb ab abb aabb aaabb aabbb ); { my ($a_counter, $b_counter); for my $string ( @strings ) { $a_counter = 0; $b_counter = 0; print "In $string there were $a_counter 'a's and $b_counter 'b's. +\n" if ( $string =~ /(a(?{$a_counter ++;}))+(b(?{$b_counter ++;}))+/ +); } }` [download] Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re^2: Extracting the number of repetitions from a regex by pat_mc (Pilgrim) on Dec 18, 2008 at 20:30 UTC
Roy - Thanks for shedding some light onto this issue. I am grateful for your functional modification of the code which does indeed produce the desired output. As for the reasoning why the original code behaves the way it does, however, I am not sure I follow your explanation. Since the regex is missing the `//o` modifier I would have expected it to get re-compiled on every iteration of the `for`-loop (please correct me if I am wrong here!). If it doesn't - why not? On the other hand, your reasoning is supported by the output of the following code modification: `my @strings = qw( aaabbbb ab abb aabb aaabb aabbb ); my $a_counter; my $b_counter; for my $string ( @strings ) { print "In $string there were $a_counter 'a's and $b_counter 'b's.\ +n" if ( $string =~ /(a(?{$a_counter ++; } ))+(b(?{$b_counter ++;}))+ +/ ); }` [download] which produces `In aaabbbb there were 3 'a's and 4 'b's. In ab there were 4 'a's and 5 'b's. In abb there were 5 'a's and 7 'b's. In aabb there were 7 'a's and 9 'b's. In aaabb there were 10 'a's and 11 'b's. In aabbb there were 12 'a's and 14 'b's.` [download] So: Can anybody please explain conclusively what is happening here in the compilation of the regex?	[reply] [d/l] [select]
Re: Extracting the number of repetitions from a regex by mr_mischief (Monsignor) on Dec 18, 2008 at 16:52 UTC
I'm not sure at first glance how to fix your version, but `tr///` counts the characters it replaces and returns that number. `my @strings = qw( aaabbbb ab abb aabb aaabb aabbb ); for my $string ( @strings ) { my $a_counter = $string =~ tr/a/a/; my $b_counter = $string =~ tr/b/b/; print "In $string there were $a_counter 'a's and $b_counter 'b's. +\n"; }` [download]	[reply] [d/l] [select]
Re: Extracting the number of repetitions from a regex by ikegami (Patriarch) on Dec 18, 2008 at 21:36 UTC
Something you haven't considered is backtracking. Your code, cleaned up: `my @strings = qw( acaabbbb ); for ( @strings ) { local our $a_counter = 0; local our $b_counter = 0; / (?: a (?{ $a_counter++ }) )+ (?: b (?{ $b_counter++ }) )+ /x; print("a: $a_counter\n"); # a: 3 print("b: $b_counter\n"); # b: 4 }` [download] If you want 2,4 instead of 3,4, you need to take backtracking into account. `my @strings = qw( acaabbbb ); for ( @strings ) { local our $a_counter = 0; local our $b_counter = 0; / (?{ # Initialize $^R { a => 0, b => 0 } }) (?: a (?{ my %h = %{$^R}; $h{a}++; +{ %h } }) )+ (?: b (?{ my %h = %{$^R}; $h{b}++; +{ %h } }) )+ (?{ # Return results my %h = %{$^R}; $a_counter = $h{a}; $b_counter = $h{b}; }) /x; print("a: $a_counter\n"); # a: 2 print("b: $b_counter\n"); # b: 4 }` [download]	[reply] [d/l] [select]
Re^2: Extracting the number of repetitions from a regex by pat_mc (Pilgrim) on Dec 19, 2008 at 11:06 UTC
I suppose this is also the core of an answer to my follow-up post Integrating match counts into regex matching ... thanks again for the profound insights, ikegami!	[reply]
Re: Extracting the number of repetitions by Anonymous Monk on Dec 18, 2008 at 16:18 UTC
Add this to the end and you'll know why `warn "suprise $a_counter"; warn "suprise $b_counter";` [download]	[reply] [d/l]
Re^2: Extracting the number of repetitions by pat_mc (Pilgrim) on Dec 18, 2008 at 16:37 UTC
Sorry, Anonymous_Monk - I don't get it. Can you please switch your output to `verbose` mode for the Perl-underprivileged? It looks as if the counts in the evaluation of the regex work fine but somehow the result does not get transferred to the external variables ... not sure what this tells me, though. Could you please expand? Thanks - Pat	[reply] [d/l]
Re^3: Extracting the number of repetitions by Anonymous Monk on Dec 18, 2008 at 16:40 UTC
Sorry, I was wrong :( Try `use re 'debug';` for ideas	[reply] [d/l]
Re^4: Extracting the number of repetitions by pat_mc (Pilgrim) on Dec 18, 2008 at 20:53 UTC


laziness, impatience, and hubris
	PerlMonks