How can I access the number of repititions in a regex?

pat_mc has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How can I access the number of repititions in a regex? by GrandFather (Saint) on Mar 11, 2008 at 13:13 UTC
Consider: `my $str = "12345"; my $count = () = $str =~ /\d/g; print $count;` [download] Prints: `5` [download] The `() =` puts the regex into list context, then the scalar assignment, as is usual, assigns the countof the items in the list. Perl is environmentally friendly - it saves trees	[reply] [d/l] [select]
Re^2: How can I access the number of repititions in a regex? by amarquis (Curate) on Mar 11, 2008 at 18:42 UTC
Would you mind explaining how this works? I understand that the empty () imposes list context on the right, making the match return the number of matches. Then you are left with the assignment of a scalar to an empty list, which is in turn assigned to $count. I guess I'm just not sure how the "5" propagates over the empty list in the middle to land in the $count variable. Sorry if I'm being dense.	[reply]
Re^3: How can I access the number of repititions in a regex? by Errto (Vicar) on Mar 11, 2008 at 20:36 UTC
I understand that the empty () imposes list context on the right... correct ...making the match return the number of matches. Not quite. The match returns a list of matches. A list assignment in scalar context returns the number of elements in the list, regardless of the number of those elements that actually get assigned to anything (in this case, zero). So the "5" doesn't propagate. It's the use of the assignment in scalar context that creates the "5".	[reply]
Re^4: How can I access the number of repititions in a regex? by amarquis (Curate) on Mar 12, 2008 at 03:19 UTC
Re^3: How can I access the number of repititions in a regex? by GrandFather (Saint) on Mar 12, 2008 at 02:14 UTC
See Perl Idioms Explained - my $count = () = /.../g Perl is environmentally friendly - it saves trees	[reply]
Re^2: How can I access the number of repititions in a regex? by pat_mc (Pilgrim) on Mar 11, 2008 at 13:45 UTC
Thanks, GrandFather - I got that. Nice solution ... though subject to the same limitations as the one by olus above. What if I want to evaluate multiple quantifiers in the same regex?	[reply]
Re^3: How can I access the number of repititions in a regex? by grizzley (Chaplain) on Mar 11, 2008 at 14:39 UTC
There is such possibility, but using 'highly experimental' :) code: `$_='this is some text(55). this is another text. there are 2 texts.'; @re=/text(?{$cnt1++})\|\d+(?{$cnt2++})/g; print "'text' occured $cnt1 times\n"; print "'\\d+' occured $cnt2 times\n";` [download] But there are some problems: what if one of your regexps is part of another one?	[reply] [d/l]
Re^4: How can I access the number of repititions in a regex? by pat_mc (Pilgrim) on Dec 18, 2008 at 15:15 UTC
Re^3: How can I access the number of repititions in a regex? by Anonymous Monk on Mar 11, 2008 at 14:09 UTC
quantifiers are ?, *, +, and {}, so you're not talking about quantifiers	[reply]
Re^4: How can I access the number of repititions in a regex? by pat_mc (Pilgrim) on Mar 11, 2008 at 20:44 UTC
Re^5: How can I access the number of repititions in a regex? by driver8 (Scribe) on Mar 14, 2008 at 01:13 UTC
Some notes below your chosen depth have not been shown here
array and list return different things in scalar context by metaperl (Curate) on Mar 11, 2008 at 14:44 UTC
The () = puts the regex into list context, then the scalar assignment, as is usual, assigns the countof the items in the list. An array returns its weight, a list returns its last... a list in scalar context returns its last element, not the count of elements.	[reply]
Re: array and list return different things in scalar context by kyle (Abbot) on Mar 11, 2008 at 21:00 UTC
a list in scalar context returns its last element There's a somewhat detailed discussion of what's wrong with this idea starting about here. Here's one problem with it: `my @x = ( 6, 7, 8 ); my $last_element = (4, 5, @x)[-1]; my $as_scalar = (4, 5, @x); print "last element: $last_element\n"; print "as scalar: $as_scalar\n"; __END__ last element: 8 as scalar: 3` [download] One might say that the scalar context is applied to the last element of the list expression (not the list itself). Note that `scalar sub_that_returns_list()` will apply the scalar context to everything in the sub's return list, but `scalar ( 'literal', 'list', 'expression' )` puts `'literal'` and `'list'` in void context.	[reply] [d/l] [select]
Re: array and list return different things in scalar context by Errto (Vicar) on Mar 11, 2008 at 20:37 UTC
This isn't a "list in scalar context", it's a list assignment in scalar context, which has a very well-defined behavior: it returns the number of elements in the list on the right side of the assignment operator.	[reply]
Re: How can I access the number of repititions in a regex? by olus (Curate) on Mar 11, 2008 at 12:50 UTC
I don't know of a special variable for that but you can get the number of matches the following way `my @matches; .... while(<>) { my @matches = $_ =~ m/$string/g; my $repetition = @matches; print "$string was encountered $repetition times.\n" if($repetitio +n > 0); }` [download] As for your second question, you are not closing the code tag correctly. You are using '\' where it should be '/'. So the closing tag should read </code> update added the `g` modifier to the regexp. Thanks wfsp and Fletch for alerting me it was missing. It not being there was a typo. I tested it with the code shown bellow, but when adapting the test code to the block shown on the OP somehow the 'g' was skiped. `use strict; use warnings; my $text = "match other match not useful match sample word"; my $string = "match"; my @matches = $text =~ m/$string/g; my $count = @matches; print $count;` [download]	[reply] [d/l] [select]
Re^2: How can I access the number of repititions in a regex? by ack (Deacon) on Mar 11, 2008 at 17:50 UTC
Here is an example that may work for the OP. It uses a similar approach as that found in perlretut (as suggested by the good monk, pancho) on page 22 of the tutorial. It allows multiple matches of various capturing subexpressions and keeps track of the number of matches for each of those subexpressions. I just tested it for the admittedly simple case shown and it appears to do what is sought. Read more... (833 Bytes) Fore the input: `$text = "match other match not useful match not same match\n"` the output is: `frequency of regex capture (\b\w+\b) is 8 frequency of regex capture (\s) is 8 frequency of regex capture (\w+$) is 1` [download] The user has to specify the various capturing regex subexpressions and associate an element of the array @word with each of them. This array records each time it is matched by using the in-line regex code subpattern, (?{ }). I think this does what the OP was looking for. I'm not sure how robust it is (i.e., how flexible it is for example with nested capturing subexpressions, for alternating subexpressions, etc. I am always challenged, especially, by alternating subexpressions. ack Albuquerque, NM	[reply] [d/l] [select]
Re^2: How can I access the number of repititions in a regex? by pat_mc (Pilgrim) on Mar 11, 2008 at 13:06 UTC
Hi, olus - Thanks for your quick answers on both accounts. Ad 1) I understand the solution you propose. It is certainly an option to include all matches into a list and return the size of the list. The downside of this approach is that in more complex regular expressions I need separate lists and regexes for every quantifier I want to access. `while ( <> ) { print "$string_1 matched ", scalar @list_1, "times." if @list_1 = $_~ +/regex_1/g; print "$string_2 matched ", scalar @list_2, "times." if @list_2 = $_=~ + /regex_1/g; }` [download] Clearly, this could get messy ... are there any alternatives to this? Ad 2) Good stuff ... the code tags now work! How do I indent? Thanks again! Cheers - Pat	[reply] [d/l]
Re^3: How can I access the number of repititions in a regex? by Fletch (Bishop) on Mar 11, 2008 at 13:16 UTC
`Erm, just use that big wide key at the bottom of your keyboard? (Burma Shave)` [download] Update: And to clarify what I think you're saying your problem is: you've got a regex with multiple captures of varying length (say, `/(a+)(b+)/`) and you want to know how many repetitions each captured subexpression matched (i.e. how many "a"s and how many "b"s) for an arbitrary regex. Which is actually a kind of neat question (and I'm drawing a blank of an "elegant" solution off the cuff; I initially was going to comment about the operator-which-shall-not-be-named too (`=()=`) but then read the above post and saw you had possibly several subexpressions to count). The cake is a lie. The cake is a lie. The cake is a lie.	[reply] [d/l] [select]
Re^4: How can I access the number of repititions in a regex? by ww (Archbishop) on Mar 11, 2008 at 14:23 UTC
Re^4: How can I access the number of repititions in a regex? by pat_mc (Pilgrim) on Mar 11, 2008 at 13:35 UTC
Re^3: How can I access the number of repititions in a regex? by Anonymous Monk on Mar 11, 2008 at 14:14 UTC
We call it programming `while(<>){ for my $pat ( qr/\d/, qr/string/ ){ my $count = () = /$pat/g; print "$pat matched $count times\n"; } }` [download]	[reply] [d/l]
Re^3: How can I access the number of repititions in a regex? by olus (Curate) on Mar 11, 2008 at 14:42 UTC
The first solution that occurred to me is shown in the following code. Note that I considered splitting the input text on spaces, and that may not be a solution for you depending on your actual input and the patterns you are looking for. `use strict; use warnings; use Data::Dumper; my $text = "match other match not useful match sample word"; my $string1 = "match"; my $string2 = "not"; my %repetitions; map {$repetitions{$_}++;} grep /$string1\|$string2/, split / /, $text; print Dumper(\%repetitions); ### Outputs: $VAR1 = { 'match' => 3, 'not' => 1 };` [download]	[reply] [d/l]
Re^3: How can I access the number of repititions in a regex? by thundergnat (Deacon) on Mar 11, 2008 at 20:12 UTC
Abstract the matching logic out into a subroutine `use warnings; use strict; my @strings = qw/ 1 2 3 4 5 6 7 8 90 /; while ( my $line = <DATA> ) { print $line; for (@strings) { my $count = matches( $line, $_ ); print "$_ matched $count times.\n" if $count; } print "\n"; } sub matches { return () = $_[0] =~ /\Q$_[1]\E/g; } __DATA__ 9087126348716340789126348907164 l3klj09934u098u5tio2354uj908rye qoiriopuj3u45098479183248r95r77 [q9u4r0983u490ru340u54ioeuf9p8h 23qioh89174y9843y7r9843r87e8714 [9490838945r8974r9834093409tr34` [download]	[reply] [d/l]
Re: How can I access the number of repititions in a regex? by casiano (Pilgrim) on Mar 11, 2008 at 16:44 UTC
Yet another solution: `$ cat -n count.pl 1 use strict; 2 use List::Util qw(sum); 3 4 my @a; 5 my $string = 'ab'; 6 my $rep = sum(map { @a = /$string/g; 0+@a } <>); 7 print "$rep\n";` [download] Cheers Casiano	[reply] [d/l]
Re: How can I access the number of repititions in a regex? by Anonymous Monk on Mar 11, 2008 at 13:05 UTC
How can I count the number of occurrences of a substring within a string? `$count = () = $string =~ /-\d+/g;` [download]	[reply] [d/l]
Re^2: How can I access the number of repititions in a regex? by pat_mc (Pilgrim) on Mar 11, 2008 at 13:12 UTC
Hi, Anonymous Monk - What exactly does your code do? It looks as if it was trying to match for a pattern "-number" in $string ... how would that be of any help? Please explain. Thanks and regards - Pat	[reply]
Re: How can I access the number of repititions in a regex? by Pancho (Pilgrim) on Mar 11, 2008 at 13:12 UTC
Also, see the perlretut tutorial on regular expressions and search for this This example counts character frequencies in a line:	[reply]


Perl Monk, Perl Meditation
	PerlMonks