comment on

Your regex /(.{2,}).*\1/g will always try to capture the largest thing it can in $1. In your example string, every "b" character is followed by a "c". So every position where the string could match /b.*b/, it could also match /bc.*bc/. Since the "bc" version is longer, that's the one that will be tried first by the regex engine, and will return with success. It will never return success with $1 eq "b", even though a "b" character repeats itself in the string.

I personally believe that this obvious... now that you point it out... Anyway I now wonder if at this point the best thing could be to generate all substrings e.g. with two nested maps and a uniq-like technique and possibly filter out those that have a count of 1 if one is not interested in them. My approach at a filtering in the generation phase by means of a regex may be fixable somehow but I can't see an easy way...

Update: it's also worth noting that m//g does not mean "try to match every possible way this match could succeed". Instead it means, "try to find one match starting at each position in the string" .. So in the above, when it matches on "bc", it will not continue backtracking to pick up the match with "b". Instead, it will be satisfied that it found a match starting at that position, increment pos, and move on.

But in fact this is the reason why I explicitly set pos. Perl 6 provides an adverb to do so in the first place instead -matching with superimpositions-, which is very good.

Update: the following, for example, finally works really correctly.

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;
use constant MIN => 2;

my $str='aabcdabcabcecdecd';

sub count {
    local $_=shift;
    my $l=length;
    my %count;
    
    for my $off (0..$l-1) {
        for my $len (MIN .. $l-$off) {
            my $s=substr $_, $off, $len;
            $count{ $s } ||= ()= /$s/g;
        }
        $count{$_} == 1 and
          delete $count{$_} for keys %count;
    }
    \%count;
}

print Dumper count $str;

__END__
[download]

In reply to Re^2: how to count the number of repeats in a string (really!) by blazar
in thread how to count the number of repeats in a string (really!) by blazar

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Do you know where your variables are?
	PerlMonks