After benchmarking, you're the clear winner:
I personally believe that I'm not! ;)
Anyway, I extended the benchmark to include oha's solution (in my version) and ikegami's. For definiteness I made sure to compare subs that do exactly the same thing: accept a string and return a hashref of the counts, and that the counts are those of strings of length 2 or more repeated 2 or more times. I could not include lodin's solution because I haven't the slightest idea of how to modify it so that only strings of length 2 or more are taken into account. (Short of deleting thos of length 1.) Admittedly, it would be interesting to see how the benchmark goes with different data sets...
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use constant MIN => 2;
use Benchmark qw/:all :hireswallclock/;
my $str='aabcdabcabcecdecd';
sub blazar {
local $_=shift;
my $l=length;
my %count;
for my $off (0..$l-1) {
for my $len (MIN .. $l-$off) {
my $s=substr $_, $off, $len;
$count{ $s } ||= ()= /$s/g;
}
$count{$_} == 1 and
delete $count{$_} for keys %count;
}
\%count;
}
sub oha {
my $s=shift;
my %saw;
while($s =~ /(..+)(?=.*?\1)/g) {
for my $x (0..length $1) {
@saw{ map {substr $1, $x, $_} $x+2..length $1 } = ();
}
}
$saw{$_} =()= $s =~ /\Q$_/g for keys %saw;
\%saw;
}
{
my %count;
sub count1 {
my( $string) = @_;
my $length = length( $string );
if ($length < MIN) {
$count{$_} == 1 and delete $count{$_} for keys %count;
return \%count;
}
for my $l (MIN..$length) {
my $s = substr( $string, 0, $l );
$count{ $s } += 1;
}
count1( substr( $string, 1 ) );
}
sub kramba {
count1 shift;
for (keys %count) {
delete $count{$_} unless
$count{$_} >= 2 and length($_) >= MIN;
}
\%count;
}
}
sub ikegami {
my $str = shift;
local our %counts;
$str =~ /
(.{2,}) # or (.+)
(?(?{ !$counts{$1} })(?=.*\1))
(?{ ++$counts{$1} })
(?!)
/x;
\%counts;
}
cmpthese 10_000 => {
blazar => sub { blazar $str },
oha => sub { oha $str },
kramba => sub { kramba $str },
ikegami => sub { ikegami $str },
};
__END__
Results:
Rate blazar kramba ikegami oha
blazar 405/s -- -77% -89% -89%
kramba 1788/s 341% -- -52% -52%
ikegami 3743/s 823% 109% -- -0%
oha 3743/s 823% 109% 0% --
Update: I included lodin's solution as per his direction as follows:
sub lodin {
my $str = shift;
local our %count;
$str =~ /
(.{2,})
(?(?{ $count{$1} })
(?!)
)
.*
\1
(?{ ($count{$1} ||= 1)++ })
(?!)
/x;
\%count;
}
I also corrected the kramba subroutine as per lodin's remark:
sub kramba {
my %count;
my $count;
$count = sub {
my( $string) = @_;
my $length = length( $string );
if ($length < MIN) {
$count{$_} == 1 and delete $count{$_} for keys %count;
return \%count;
}
for my $l (MIN..$length) {
my $s = substr( $string, 0, $l );
$count{ $s } += 1;
}
$count->( substr( $string, 1 ) );
};
$count->(shift);
for (keys %count) {
delete $count{$_} unless
$count{$_} >= 2 and length($_) >= MIN;
}
\%count;
}
The new results are:
Rate blazar kramba oha ikegami lodin
blazar 577/s -- -72% -90% -91% -91%
kramba 2045/s 254% -- -66% -67% -69%
oha 6039/s 946% 195% -- -2% -9%
ikegami 6154/s 966% 201% 2% -- -8%
lodin 6667/s 1055% 226% 10% 8% --
Update: lodin draws my attention by /msg on the fact that the closure leaks and he directs me to his own Sub::Recursive both for an explanation and a fix. He also notices that he didn't include the manual fix in the docs. Anyway, one can use local our $count; in this particular instance. I'm doing a new benchmark which will be posted in a separate node taking also this into account, although I don't think it will make that much of a difference.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.