Re: How can I access the number of repititions in a regex?
by GrandFather (Saint) on Mar 11, 2008 at 13:13 UTC
|
my $str = "12345";
my $count = () = $str =~ /\d/g;
print $count;
Prints:
5
The () = puts the regex into list context, then the scalar assignment, as is usual, assigns the countof the items in the list.
Perl is environmentally friendly - it saves trees
| [reply] [d/l] [select] |
|
Would you mind explaining how this works? I understand that the empty () imposes list context on the right, making the match return the number of matches.
Then you are left with the assignment of a scalar to an empty list, which is in turn assigned to $count. I guess I'm just not sure how the "5" propagates over the empty list in the middle to land in the $count variable.
Sorry if I'm being dense.
| [reply] |
|
I understand that the empty () imposes list context on the right...
correct
...making the match return the number of matches.
Not quite. The match returns a list of matches. A list assignment in scalar context returns the number of elements in the list, regardless of the number of those elements that actually get assigned to anything (in this case, zero). So the "5" doesn't propagate. It's the use of the assignment in scalar context that creates the "5".
| [reply] |
|
|
| [reply] |
|
Thanks, GrandFather - I got that.
Nice solution ... though subject to the same limitations as the one by olus above. What if I want to evaluate multiple quantifiers in the same regex?
| [reply] |
|
$_='this is some text(55). this is another text. there are 2 texts.';
@re=/text(?{$cnt1++})|\d+(?{$cnt2++})/g;
print "'text' occured $cnt1 times\n";
print "'\\d+' occured $cnt2 times\n";
But there are some problems: what if one of your regexps is part of another one? | [reply] [d/l] |
|
|
quantifiers are ?, *, +, and {}, so you're not talking about quantifiers
| [reply] |
|
|
|
|
The () = puts the regex into list context, then the scalar assignment, as is usual, assigns the countof the items in the list.
An array returns its weight, a list returns its last... a list in scalar context returns its last element, not the count of elements.
| [reply] |
|
my @x = ( 6, 7, 8 );
my $last_element = (4, 5, @x)[-1];
my $as_scalar = (4, 5, @x);
print "last element: $last_element\n";
print "as scalar: $as_scalar\n";
__END__
last element: 8
as scalar: 3
One might say that the scalar context is applied to the last element of the list expression (not the list itself). Note that scalar sub_that_returns_list() will apply the scalar context to everything in the sub's return list, but scalar ( 'literal', 'list', 'expression' ) puts 'literal' and 'list' in void context. | [reply] [d/l] [select] |
|
This isn't a "list in scalar context", it's a list assignment in scalar context, which has a very well-defined behavior: it returns the number of elements in the list on the right side of the assignment operator.
| [reply] |
Re: How can I access the number of repititions in a regex?
by olus (Curate) on Mar 11, 2008 at 12:50 UTC
|
my @matches;
....
while(<>) {
my @matches = $_ =~ m/$string/g;
my $repetition = @matches;
print "$string was encountered $repetition times.\n" if($repetitio
+n > 0);
}
As for your second question, you are not closing the code tag correctly. You are using '\' where it should be '/'. So the closing tag should read </code>
update added the g modifier to the regexp. Thanks wfsp and Fletch for alerting me it was missing. It not being there was a typo. I tested it with the code shown bellow, but when adapting the test code to the block shown on the OP somehow the 'g' was skiped.
use strict;
use warnings;
my $text = "match other match not useful match sample word";
my $string = "match";
my @matches = $text =~ m/$string/g;
my $count = @matches;
print $count;
| [reply] [d/l] [select] |
|
Here is an example that may work for the OP. It uses a similar approach as that found in perlretut (as suggested by the good monk, pancho) on page 22 of the tutorial. It allows multiple matches of various capturing subexpressions and keeps track of the number of matches for each of those subexpressions. I just tested it for the admittedly simple case shown and it appears to do what is sought. Fore the input: $text = "match other match not useful match not same match\n"the output is:
frequency of regex capture (\b\w+\b) is 8
frequency of regex capture (\s) is 8
frequency of regex capture (\w+$) is 1
The user has to specify the various capturing regex subexpressions and associate an element of the array @word with each of them. This array records each time it is matched by using the in-line regex code subpattern, (?{ }). I think this does what the OP was looking for. I'm not sure how robust it is (i.e., how flexible it is for example with nested capturing subexpressions, for alternating subexpressions, etc. I am always challenged, especially, by alternating subexpressions.
| [reply] [d/l] [select] |
|
Hi, olus -
Thanks for your quick answers on both accounts.
Ad 1) I understand the solution you propose. It is certainly an option to include all matches into a list and return the size of the list. The downside of this approach is that in more complex regular expressions I need separate lists and regexes for every quantifier I want to access.
while ( <> )
{
print "$string_1 matched ", scalar @list_1, "times." if @list_1 = $_~
+/regex_1/g;
print "$string_2 matched ", scalar @list_2, "times." if @list_2 = $_=~
+ /regex_1/g;
}
Clearly, this could get messy ... are there any alternatives to this?
Ad 2) Good stuff ... the code tags now work! How do I indent?
Thanks again!
Cheers -
Pat | [reply] [d/l] |
|
Erm,
just use
that big wide key
at the bottom
of your
keyboard?
(Burma Shave)
Update: And to clarify what I think you're saying your problem is: you've got a regex with multiple captures of varying length (say, /(a+)(b+)/) and you want to know how many repetitions each captured subexpression matched (i.e. how many "a"s and how many "b"s) for an arbitrary regex.
Which is actually a kind of neat question (and I'm drawing a blank of an "elegant" solution off the cuff; I initially was going to comment about the operator-which-shall-not-be-named too (=()=) but then read the above post and saw you had possibly several subexpressions to count).
The cake is a lie.
The cake is a lie.
The cake is a lie.
| [reply] [d/l] [select] |
|
|
|
while(<>){
for my $pat ( qr/\d/, qr/string/ ){
my $count = () = /$pat/g;
print "$pat matched $count times\n";
}
}
| [reply] [d/l] |
|
use strict;
use warnings;
use Data::Dumper;
my $text = "match other match not useful match sample word";
my $string1 = "match";
my $string2 = "not";
my %repetitions;
map {$repetitions{$_}++;} grep /$string1|$string2/, split / /, $text;
print Dumper(\%repetitions);
### Outputs:
$VAR1 = {
'match' => 3,
'not' => 1
};
| [reply] [d/l] |
|
use warnings;
use strict;
my @strings = qw/ 1 2 3 4 5 6 7 8 90 /;
while ( my $line = <DATA> ) {
print $line;
for (@strings) {
my $count = matches( $line, $_ );
print "$_ matched $count times.\n" if $count;
}
print "\n";
}
sub matches {
return () = $_[0] =~ /\Q$_[1]\E/g;
}
__DATA__
9087126348716340789126348907164
l3klj09934u098u5tio2354uj908rye
qoiriopuj3u45098479183248r95r77
[q9u4r0983u490ru340u54ioeuf9p8h
23qioh89174y9843y7r9843r87e8714
[9490838945r8974r9834093409tr34
| [reply] [d/l] |
Re: How can I access the number of repititions in a regex?
by casiano (Pilgrim) on Mar 11, 2008 at 16:44 UTC
|
$ cat -n count.pl
1 use strict;
2 use List::Util qw(sum);
3
4 my @a;
5 my $string = 'ab';
6 my $rep = sum(map { @a = /$string/g; 0+@a } <>);
7 print "$rep\n";
Cheers
Casiano | [reply] [d/l] |
Re: How can I access the number of repititions in a regex?
by Anonymous Monk on Mar 11, 2008 at 13:05 UTC
|
$count = () = $string =~ /-\d+/g;
| [reply] [d/l] |
|
Hi, Anonymous Monk -
What exactly does your code do? It looks as if it was trying to match for a pattern "-number" in $string ... how would that be of any help?
Please explain.
Thanks and regards -
Pat
| [reply] |
Re: How can I access the number of repititions in a regex?
by Pancho (Pilgrim) on Mar 11, 2008 at 13:12 UTC
|
Also, see the perlretut tutorial on regular expressions and search for this
This example counts character frequencies in a line:
| [reply] |