Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

How can I access the number of repititions in a regex?

by pat_mc (Pilgrim)
on Mar 11, 2008 at 12:31 UTC ( [id://673467]=perlquestion: print w/replies, xml ) Need Help??

pat_mc has asked for the wisdom of the Perl Monks concerning the following question:

Hi, All -

I am trying to access the number of repitions matched by a quantifyer in a regex. For example, I would like the following to work:
while( <> ) { print "$string was encountered $repitition times.\n" if /$string ++/g; }

My question is: Is there an elegant way (e. g. a special variable in Perl) to assign this value to $repitition? Or do I really need to include code in my regex counting up with each match?

Also: Can someone please let me know what the syntax for the <code> tags in this forum is ... when I include my code into those tags <code>text to print as code<\code> as suggested by the formatting help this never works. For the same reason I cannot get indentation to display either. Sorry!

Thanks in advance for your help!

Cheers -

Pat

Replies are listed 'Best First'.
Re: How can I access the number of repititions in a regex?
by GrandFather (Saint) on Mar 11, 2008 at 13:13 UTC

    Consider:

    my $str = "12345"; my $count = () = $str =~ /\d/g; print $count;

    Prints:

    5

    The () = puts the regex into list context, then the scalar assignment, as is usual, assigns the countof the items in the list.


    Perl is environmentally friendly - it saves trees

      Would you mind explaining how this works? I understand that the empty () imposes list context on the right, making the match return the number of matches.

      Then you are left with the assignment of a scalar to an empty list, which is in turn assigned to $count. I guess I'm just not sure how the "5" propagates over the empty list in the middle to land in the $count variable.

      Sorry if I'm being dense.

        I understand that the empty () imposes list context on the right...
        correct
        ...making the match return the number of matches.
        Not quite. The match returns a list of matches. A list assignment in scalar context returns the number of elements in the list, regardless of the number of those elements that actually get assigned to anything (in this case, zero). So the "5" doesn't propagate. It's the use of the assignment in scalar context that creates the "5".
      Thanks, GrandFather - I got that.
      Nice solution ... though subject to the same limitations as the one by olus above. What if I want to evaluate multiple quantifiers in the same regex?

        There is such possibility, but using 'highly experimental' :) code:

        $_='this is some text(55). this is another text. there are 2 texts.'; @re=/text(?{$cnt1++})|\d+(?{$cnt2++})/g; print "'text' occured $cnt1 times\n"; print "'\\d+' occured $cnt2 times\n";

        But there are some problems: what if one of your regexps is part of another one?

        quantifiers are ?, *, +, and {}, so you're not talking about quantifiers
      The () = puts the regex into list context, then the scalar assignment, as is usual, assigns the countof the items in the list.
      An array returns its weight, a list returns its last... a list in scalar context returns its last element, not the count of elements.

        a list in scalar context returns its last element

        There's a somewhat detailed discussion of what's wrong with this idea starting about here. Here's one problem with it:

        my @x = ( 6, 7, 8 ); my $last_element = (4, 5, @x)[-1]; my $as_scalar = (4, 5, @x); print "last element: $last_element\n"; print "as scalar: $as_scalar\n"; __END__ last element: 8 as scalar: 3

        One might say that the scalar context is applied to the last element of the list expression (not the list itself). Note that scalar sub_that_returns_list() will apply the scalar context to everything in the sub's return list, but scalar ( 'literal', 'list', 'expression' ) puts 'literal' and 'list' in void context.

        This isn't a "list in scalar context", it's a list assignment in scalar context, which has a very well-defined behavior: it returns the number of elements in the list on the right side of the assignment operator.
Re: How can I access the number of repititions in a regex?
by olus (Curate) on Mar 11, 2008 at 12:50 UTC

    I don't know of a special variable for that but you can get the number of matches the following way

    my @matches; .... while(<>) { my @matches = $_ =~ m/$string/g; my $repetition = @matches; print "$string was encountered $repetition times.\n" if($repetitio +n > 0); }

    As for your second question, you are not closing the code tag correctly. You are using '\' where it should be '/'. So the closing tag should read </code>

    update added the g modifier to the regexp. Thanks wfsp and Fletch for alerting me it was missing. It not being there was a typo. I tested it with the code shown bellow, but when adapting the test code to the block shown on the OP somehow the 'g' was skiped.

    use strict; use warnings; my $text = "match other match not useful match sample word"; my $string = "match"; my @matches = $text =~ m/$string/g; my $count = @matches; print $count;

      Here is an example that may work for the OP. It uses a similar approach as that found in perlretut (as suggested by the good monk, pancho) on page 22 of the tutorial. It allows multiple matches of various capturing subexpressions and keeps track of the number of matches for each of those subexpressions.

      I just tested it for the admittedly simple case shown and it appears to do what is sought.

      Fore the input:

      $text = "match other match not useful match not same match\n"

      the output is:

      frequency of regex capture (\b\w+\b) is 8 frequency of regex capture (\s) is 8 frequency of regex capture (\w+$) is 1

      The user has to specify the various capturing regex subexpressions and associate an element of the array @word with each of them. This array records each time it is matched by using the in-line regex code subpattern, (?{ }).

      I think this does what the OP was looking for.

      I'm not sure how robust it is (i.e., how flexible it is for example with nested capturing subexpressions, for alternating subexpressions, etc.

      I am always challenged, especially, by alternating subexpressions.

      ack Albuquerque, NM
      Hi, olus -

      Thanks for your quick answers on both accounts.

      Ad 1) I understand the solution you propose. It is certainly an option to include all matches into a list and return the size of the list. The downside of this approach is that in more complex regular expressions I need separate lists and regexes for every quantifier I want to access.
      while ( <> ) { print "$string_1 matched ", scalar @list_1, "times." if @list_1 = $_~ +/regex_1/g; print "$string_2 matched ", scalar @list_2, "times." if @list_2 = $_=~ + /regex_1/g; }


      Clearly, this could get messy ... are there any alternatives to this?

      Ad 2) Good stuff ... the code tags now work! How do I indent?

      Thanks again!

      Cheers -

      Pat
        Erm, just use that big wide key at the bottom of your keyboard? (Burma Shave)

        Update: And to clarify what I think you're saying your problem is: you've got a regex with multiple captures of varying length (say, /(a+)(b+)/) and you want to know how many repetitions each captured subexpression matched (i.e. how many "a"s and how many "b"s) for an arbitrary regex.

        Which is actually a kind of neat question (and I'm drawing a blank of an "elegant" solution off the cuff; I initially was going to comment about the operator-which-shall-not-be-named too (=()=) but then read the above post and saw you had possibly several subexpressions to count).

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.

        We call it programming
        while(<>){ for my $pat ( qr/\d/, qr/string/ ){ my $count = () = /$pat/g; print "$pat matched $count times\n"; } }

        The first solution that occurred to me is shown in the following code. Note that I considered splitting the input text on spaces, and that may not be a solution for you depending on your actual input and the patterns you are looking for.

        use strict; use warnings; use Data::Dumper; my $text = "match other match not useful match sample word"; my $string1 = "match"; my $string2 = "not"; my %repetitions; map {$repetitions{$_}++;} grep /$string1|$string2/, split / /, $text; print Dumper(\%repetitions); ### Outputs: $VAR1 = { 'match' => 3, 'not' => 1 };

        Abstract the matching logic out into a subroutine

        use warnings; use strict; my @strings = qw/ 1 2 3 4 5 6 7 8 90 /; while ( my $line = <DATA> ) { print $line; for (@strings) { my $count = matches( $line, $_ ); print "$_ matched $count times.\n" if $count; } print "\n"; } sub matches { return () = $_[0] =~ /\Q$_[1]\E/g; } __DATA__ 9087126348716340789126348907164 l3klj09934u098u5tio2354uj908rye qoiriopuj3u45098479183248r95r77 [q9u4r0983u490ru340u54ioeuf9p8h 23qioh89174y9843y7r9843r87e8714 [9490838945r8974r9834093409tr34
Re: How can I access the number of repititions in a regex?
by casiano (Pilgrim) on Mar 11, 2008 at 16:44 UTC
    Yet another solution:
    $ cat -n count.pl 1 use strict; 2 use List::Util qw(sum); 3 4 my @a; 5 my $string = 'ab'; 6 my $rep = sum(map { @a = /$string/g; 0+@a } <>); 7 print "$rep\n";
    Cheers Casiano
Re: How can I access the number of repititions in a regex?
by Anonymous Monk on Mar 11, 2008 at 13:05 UTC
      Hi, Anonymous Monk -

      What exactly does your code do? It looks as if it was trying to match for a pattern "-number" in $string ... how would that be of any help?
      Please explain.

      Thanks and regards -

      Pat
Re: How can I access the number of repititions in a regex?
by Pancho (Pilgrim) on Mar 11, 2008 at 13:12 UTC

    Also, see the perlretut tutorial on regular expressions and search for this
    This example counts character frequencies in a line:

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://673467]
Approved by Corion
Front-paged by Fletch
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (1)
As of 2024-04-24 16:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found