Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
After benchmarking, you're the clear winner:

I personally believe that I'm not! ;)

Anyway, I extended the benchmark to include oha's solution (in my version) and ikegami's. For definiteness I made sure to compare subs that do exactly the same thing: accept a string and return a hashref of the counts, and that the counts are those of strings of length 2 or more repeated 2 or more times. I could not include lodin's solution because I haven't the slightest idea of how to modify it so that only strings of length 2 or more are taken into account. (Short of deleting thos of length 1.) Admittedly, it would be interesting to see how the benchmark goes with different data sets...

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use constant MIN => 2; use Benchmark qw/:all :hireswallclock/; my $str='aabcdabcabcecdecd'; sub blazar { local $_=shift; my $l=length; my %count; for my $off (0..$l-1) { for my $len (MIN .. $l-$off) { my $s=substr $_, $off, $len; $count{ $s } ||= ()= /$s/g; } $count{$_} == 1 and delete $count{$_} for keys %count; } \%count; } sub oha { my $s=shift; my %saw; while($s =~ /(..+)(?=.*?\1)/g) { for my $x (0..length $1) { @saw{ map {substr $1, $x, $_} $x+2..length $1 } = (); } } $saw{$_} =()= $s =~ /\Q$_/g for keys %saw; \%saw; } { my %count; sub count1 { my( $string) = @_; my $length = length( $string ); if ($length < MIN) { $count{$_} == 1 and delete $count{$_} for keys %count; return \%count; } for my $l (MIN..$length) { my $s = substr( $string, 0, $l ); $count{ $s } += 1; } count1( substr( $string, 1 ) ); } sub kramba { count1 shift; for (keys %count) { delete $count{$_} unless $count{$_} >= 2 and length($_) >= MIN; } \%count; } } sub ikegami { my $str = shift; local our %counts; $str =~ / (.{2,}) # or (.+) (?(?{ !$counts{$1} })(?=.*\1)) (?{ ++$counts{$1} }) (?!) /x; \%counts; } cmpthese 10_000 => { blazar => sub { blazar $str }, oha => sub { oha $str }, kramba => sub { kramba $str }, ikegami => sub { ikegami $str }, }; __END__

Results:

Rate blazar kramba ikegami oha blazar 405/s -- -77% -89% -89% kramba 1788/s 341% -- -52% -52% ikegami 3743/s 823% 109% -- -0% oha 3743/s 823% 109% 0% --

Update: I included lodin's solution as per his direction as follows:

sub lodin { my $str = shift; local our %count; $str =~ / (.{2,}) (?(?{ $count{$1} }) (?!) ) .* \1 (?{ ($count{$1} ||= 1)++ }) (?!) /x; \%count; }

I also corrected the kramba subroutine as per lodin's remark:

sub kramba { my %count; my $count; $count = sub { my( $string) = @_; my $length = length( $string ); if ($length < MIN) { $count{$_} == 1 and delete $count{$_} for keys %count; return \%count; } for my $l (MIN..$length) { my $s = substr( $string, 0, $l ); $count{ $s } += 1; } $count->( substr( $string, 1 ) ); }; $count->(shift); for (keys %count) { delete $count{$_} unless $count{$_} >= 2 and length($_) >= MIN; } \%count; }

The new results are:

Rate blazar kramba oha ikegami lodin blazar 577/s -- -72% -90% -91% -91% kramba 2045/s 254% -- -66% -67% -69% oha 6039/s 946% 195% -- -2% -9% ikegami 6154/s 966% 201% 2% -- -8% lodin 6667/s 1055% 226% 10% 8% --

Update: lodin draws my attention by /msg on the fact that the closure leaks and he directs me to his own Sub::Recursive both for an explanation and a fix. He also notices that he didn't include the manual fix in the docs. Anyway, one can use local our $count; in this particular instance. I'm doing a new benchmark which will be posted in a separate node taking also this into account, although I don't think it will make that much of a difference.


In reply to Re^4: how to count the number of repeats in a string (really!) by blazar
in thread how to count the number of repeats in a string (really!) by blazar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2024-03-29 12:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found