Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Extracting the number of repetitions from a regex

by pat_mc (Pilgrim)
on Dec 18, 2008 at 16:14 UTC ( [id://731297]=perlquestion: print w/replies, xml ) Need Help??

pat_mc has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed monks -

I am trying to extract from a regular expression the number of times certain strings ('a' and 'b' in the example below) occur in a given matcher string.
my @strings = qw( aaabbbb ab abb aabb aaabb aabbb ); for my $string ( @strings ) { my $a_counter = 0; my $b_counter = 0; print "In $string there were $a_counter 'a's and $b_counter 'b's. +\n" if ( $string =~ /(a(?{$a_counter ++;}))+(b(?{$b_counter ++;}))+/ +); }
To my surprise this only works on the first iteration. Can you please let me know why? Could this have to do with the match position not getting reset? I am at a loss ...

Thanks in advance for your help and best regards -

Pat

Replies are listed 'Best First'.
Re: Extracting the number of repetitions from a regex
by BrowserUk (Patriarch) on Dec 18, 2008 at 16:59 UTC

    Lexicals don't work (right) in regex code blocks. If you switch to using globals, you'll get the results you are after:

    my @strings = qw( aaabbbb ab abb aabb aaabb aabbb ); for my $string ( @strings ) { our $a_counter = 0; our $b_counter = 0; print "In $string there were $a_counter 'a's and $b_counter 'b's. +\n" if $string =~ /(a(?{$a_counter ++;}))+(b(?{$b_counter ++;}))+/; } __END__ C:\test>junk In aaabbbb there were 3 'a's and 4 'b's. In ab there were 1 'a's and 1 'b's. In abb there were 1 'a's and 2 'b's. In aabb there were 2 'a's and 2 'b's. In aaabb there were 3 'a's and 2 'b's. In aabbb there were 2 'a's and 3 'b's.

    There is a warning about this in one of the regex pods.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Cool stuff, BrowserUk!

      This really does the trick. Since I am not really clear on the difference between 'Lexicals' and 'Globals' - could you please briefly expand on how the change from declaring the variables with my to our affects this scenario? I expected the scope of a variable definition to be that of the block is was defined in ... hence my confusion as to why the use of globals would make any difference here.

      Thanks again - Pat

        could you please briefly expand on how the change from declaring the variables with my to our affects this scenario?

        Briefly? Code blocks in regexp patterns capture lexical variabless when they are compiled, just like anonymous subs. Package variables aren't captured. In case that didn't do the trick, the longer answer follows.

        Code blocks in regexps are anonymous subs.

        sub f { my ($x) = @_; '' =~ /(?{ print "$x\n" })/; } f(4); # 4 f(5); # 4!!
        effectively does
        sub f { my ($x) = @_; $block ||= sub { print "$x\n" }; $block->(); } f(4); # 4 f(5); # 4!!

        The $x from the first pass is captured when the sub is compiled. It's a very powerful feature which allows the simplification of many problems. For example,

        BEGIN { package Prefixer; sub new { my ($class, $prefix) = @_; return bless({ prefix => $prefix }, $class); } sub prefix { my ($self) = @_; return join '', $self->{prefix}, @_; } } my $a_prefixer = Prefixer->new('a'); my $b_prefixer = Prefixer->new('b'); print("$_\n") for $a_prefixer->prefix('1'), # a1 $a_prefixer->prefix('2'), # a2 $b_prefixer->prefix('3'), # b3 $b_prefixer->prefix('4'); # b4

        can be simplified to

        sub make_prefixer { my ($prefix) = @_; return sub { return join '', $prefix, @_ }; } my $a_prefixer = make_prefixer('a'); my $b_prefixer = make_prefixer('b'); print("$_\n") for $a_prefixer->('1'), # a1 $a_prefixer->('2'), # a2 $b_prefixer->('3'), # b3 $b_prefixer->('4'); # b4

        However, subs only capture lexical variables, not package variables. By using package variables, the problem goes away.

        sub f { local our ($x) = @_; $block ||= sub { print "$x\n" }; $block->(); } f(4); # 4 f(5); # 5
        sub f { local our ($x) = @_; '' =~ /(?{ print "$x\n" })/; } f(4); # 4 f(5); # 5
Re: Extracting the number of repetitions from a regex
by Roy Johnson (Monsignor) on Dec 18, 2008 at 17:24 UTC
    It's a scoping issue. Apparently, the regex uses the variables that existed when it was first compiled, while you have declared a new set of variables on every pass. Or something like that. Anyway, this works:
    my @strings = qw( aaabbbb ab abb aabb aaabb aabbb ); { my ($a_counter, $b_counter); for my $string ( @strings ) { $a_counter = 0; $b_counter = 0; print "In $string there were $a_counter 'a's and $b_counter 'b's. +\n" if ( $string =~ /(a(?{$a_counter ++;}))+(b(?{$b_counter ++;}))+/ +); } }

    Caution: Contents may have been coded under pressure.
      Roy -

      Thanks for shedding some light onto this issue. I am grateful for your functional modification of the code which does indeed produce the desired output.

      As for the reasoning why the original code behaves the way it does, however, I am not sure I follow your explanation. Since the regex is missing the //o modifier I would have expected it to get re-compiled on every iteration of the for-loop (please correct me if I am wrong here!). If it doesn't - why not?

      On the other hand, your reasoning is supported by the output of the following code modification:
      my @strings = qw( aaabbbb ab abb aabb aaabb aabbb ); my $a_counter; my $b_counter; for my $string ( @strings ) { print "In $string there were $a_counter 'a's and $b_counter 'b's.\ +n" if ( $string =~ /(a(?{$a_counter ++; } ))+(b(?{$b_counter ++;}))+ +/ ); }
      which produces
      In aaabbbb there were 3 'a's and 4 'b's. In ab there were 4 'a's and 5 'b's. In abb there were 5 'a's and 7 'b's. In aabb there were 7 'a's and 9 'b's. In aaabb there were 10 'a's and 11 'b's. In aabbb there were 12 'a's and 14 'b's.
      So: Can anybody please explain conclusively what is happening here in the compilation of the regex?
Re: Extracting the number of repetitions from a regex
by mr_mischief (Monsignor) on Dec 18, 2008 at 16:52 UTC
    I'm not sure at first glance how to fix your version, but tr/// counts the characters it replaces and returns that number.

    my @strings = qw( aaabbbb ab abb aabb aaabb aabbb ); for my $string ( @strings ) { my $a_counter = $string =~ tr/a/a/; my $b_counter = $string =~ tr/b/b/; print "In $string there were $a_counter 'a's and $b_counter 'b's. +\n"; }
Re: Extracting the number of repetitions from a regex
by ikegami (Patriarch) on Dec 18, 2008 at 21:36 UTC
    Something you haven't considered is backtracking. Your code, cleaned up:
    my @strings = qw( acaabbbb ); for ( @strings ) { local our $a_counter = 0; local our $b_counter = 0; / (?: a (?{ $a_counter++ }) )+ (?: b (?{ $b_counter++ }) )+ /x; print("a: $a_counter\n"); # a: 3 print("b: $b_counter\n"); # b: 4 }

    If you want 2,4 instead of 3,4, you need to take backtracking into account.

    my @strings = qw( acaabbbb ); for ( @strings ) { local our $a_counter = 0; local our $b_counter = 0; / (?{ # Initialize $^R { a => 0, b => 0 } }) (?: a (?{ my %h = %{$^R}; $h{a}++; +{ %h } }) )+ (?: b (?{ my %h = %{$^R}; $h{b}++; +{ %h } }) )+ (?{ # Return results my %h = %{$^R}; $a_counter = $h{a}; $b_counter = $h{b}; }) /x; print("a: $a_counter\n"); # a: 2 print("b: $b_counter\n"); # b: 4 }
Re: Extracting the number of repetitions
by Anonymous Monk on Dec 18, 2008 at 16:18 UTC
    Add this to the end and you'll know why
    warn "suprise $a_counter"; warn "suprise $b_counter";
      Sorry, Anonymous_Monk -

      I don't get it. Can you please switch your output to verbose mode for the Perl-underprivileged?

      It looks as if the counts in the evaluation of the regex work fine but somehow the result does not get transferred to the external variables ... not sure what this tells me, though.

      Could you please expand?

      Thanks - Pat
        Sorry, I was wrong :( Try use re 'debug'; for ideas

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://731297]
Approved by kyle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2024-04-25 14:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found