Re: how to count the number of repeats in a string (really!)

Replies are listed 'Best First'.
Re^2: how to count the number of repeats in a string (really!) by blazar (Canon) on Nov 14, 2007 at 15:33 UTC
Your regex `/(.{2,}).\1/g` will always try to capture the largest* thing it can in $1. In your example string, every "b" character is followed by a "c". So every position where the string could match `/b.b/`, it could also match `/bc.bc/`. Since the "bc" version is longer, that's the one that will be tried first by the regex engine, and will return with success. It will never return success with `$1 eq "b"`, even though a "b" character repeats itself in the string. I personally believe that this obvious... now that you point it out... Anyway I now wonder if at this point the best thing could be to generate all substrings e.g. with two nested maps and a uniq-like technique and possibly filter out those that have a count of 1 if one is not interested in them. My approach at a filtering in the generation phase by means of a regex may be fixable somehow but I can't see an easy way... Update: it's also worth noting that m//g does not mean "try to match every possible way this match could succeed". Instead it means, "try to find one match starting at each position in the string" .. So in the above, when it matches on "bc", it will not continue backtracking to pick up the match with "b". Instead, it will be satisfied that it found a match starting at that position, increment pos, and move on. But in fact this is the reason why I explicitly set pos. Perl 6 provides an adverb to do so in the first place instead -matching with superimpositions-, which is very good. Update: the following, for example, finally works really correctly. Read more... (685 Bytes)	[reply] [d/l] [select]
Re^3: how to count the number of repeats in a string (really!) by Krambambuli (Curate) on Nov 14, 2007 at 17:56 UTC
I've tried to smoothen the differences between your non-recursive but using regexp solution and mine, which is recursive but doesn't use any RX. After benchmarking, you're the clear winner: `Benchmark: timing 2000 iterations of Krambambuli, blazar... Krambambuli: 4 wallclock secs ( 3.28 usr + 0.01 sys = 3.29 CPU) @ 6 +07.90/s (n=2000) blazar: 1 wallclock secs ( 0.80 usr + 0.00 sys = 0.80 CPU) @ 25 +00.00/s (n=2000)` [download] Congrats! :) Here's the code I've used for the benchmarking. Read more... (1170 Bytes) Update: Actually, it's in fact the other way round, my code is faster - I've just named the benchmarked subs wrongly. Duh! Sorry. Krambambuli --- enjoying Mark Jason Dominus' Higher-Order Perl	[reply] [d/l] [select]
Re^4: how to count the number of repeats in a string (really!) by blazar (Canon) on Nov 14, 2007 at 22:08 UTC
After benchmarking, you're the clear winner: I personally believe that I'm not! ;) Anyway, I extended the benchmark to include oha's solution (in my version) and ikegami's. For definiteness I made sure to compare subs that do exactly the same thing: accept a string and return a hashref of the counts, and that the counts are those of strings of length 2 or more repeated 2 or more times. I could not include lodin's solution because I haven't the slightest idea of how to modify it so that only strings of length 2 or more are taken into account. (Short of deleting thos of length 1.) Admittedly, it would be interesting to see how the benchmark goes with different data sets... Read more... (2 kB) Results: `Rate blazar kramba ikegami oha blazar 405/s -- -77% -89% -89% kramba 1788/s 341% -- -52% -52% ikegami 3743/s 823% 109% -- -0% oha 3743/s 823% 109% 0% --` [download] Update: I included lodin's solution as per his direction as follows: Read more... (427 Bytes) I also corrected the `kramba` subroutine as per lodin's remark: Read more... (821 Bytes) The new results are: `Rate blazar kramba oha ikegami lodin blazar 577/s -- -72% -90% -91% -91% kramba 2045/s 254% -- -66% -67% -69% oha 6039/s 946% 195% -- -2% -9% ikegami 6154/s 966% 201% 2% -- -8% lodin 6667/s 1055% 226% 10% 8% --` [download] Update: lodin draws my attention by /msg on the fact that the closure leaks and he directs me to his own Sub::Recursive both for an explanation and a fix. He also notices that he didn't include the manual fix in the docs. Anyway, one can use `local our $count;` in this particular instance. I'm doing a new benchmark which will be posted in a separate node taking also this into account, although I don't think it will make that much of a difference.	[reply] [d/l] [select]
Re^5: how to count the number of repeats in a string (really!) by lodin (Hermit) on Nov 14, 2007 at 23:41 UTC
Re^5: how to count the number of repeats in a string (really!) by lodin (Hermit) on Nov 15, 2007 at 00:00 UTC
Re^5: how to count the number of repeats in a string (really!) by oha (Friar) on Nov 15, 2007 at 11:22 UTC
Re^5: how to count the number of repeats in a string (really!) by Krambambuli (Curate) on Nov 15, 2007 at 18:35 UTC
Re^6: how to count the number of repeats in a string (really!) by lodin (Hermit) on Nov 16, 2007 at 01:38 UTC
Re^6: how to count the number of repeats in a string (really!) by ikegami (Patriarch) on Nov 16, 2007 at 02:37 UTC
Re^6: how to count the number of repeats in a string (really!) by blazar (Canon) on Nov 16, 2007 at 23:31 UTC
Some notes below your chosen depth have not been shown here


Syntactic Confectionery Delight
	PerlMonks