Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: substrings that consist of repeating characters

by salva (Canon)
on Sep 28, 2020 at 20:50 UTC ( [id://11122306]=note: print w/replies, xml ) Need Help??


in reply to substrings that consist of repeating characters

A simpler variation of your code:
use strict; use warnings; my $string = "AAAAAAATTTAGTTCTTAAGGCTGACATCGGTTTACGTCAGCGTTACCCCCCAAGT +TTTTTTTTTTTTTTTTTATTGGGGACTTT"; my $len = 0; my $best = ""; while ($string =~ /((.)\2{$len,})/g) { $len = length $1; $best = $1 } print "best: $best\n"

Replies are listed 'Best First'.
Re^2: substrings that consist of repeating characters
by GrandFather (Saint) on Sep 28, 2020 at 22:17 UTC

    At risk of upsetting likbez:

    use strict; use warnings; my $string = "AAAATTTAGTTCTTAAGGCTGACATCACGTCAGCGTTACCCCCCAAGATTGGGGAC +TTT"; my $len = 0; my $best = ''; $best = $1, $len = length $1 while $string =~ /((.)\2{$len,})/g; print "best: $best ($len)\n"

    Prints:

    best: CCCCCC (6)
    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re^2: substrings that consist of repeating characters
by salva (Canon) on Sep 29, 2020 at 09:26 UTC
    Though, note that that regular expression in my comment above is pretty inefficient as it looks for the longest match at every character instead of skipping chunks of the same character once the match fails at the character starting it (the regular expression in the OP is much better in that regard).

    We can use (*SKIP) to avoid that:

    my $len = 0; my $best = ""; while ($string =~ /((.)(?:(*SKIP)\2){$len,})/g) { $len = length $1; $best = $1 } print "best: $best\n"

    But that is still not completely efficient: the regexp is recompiled at every loop iteration because of $len, so maybe the following simpler code could be faster:

    my $best = ""; while ($string =~ /((.)\2+)/g) { $best = $1 if length $1 > length $best } print "best: $best\n"

    Or maybe this more convoluted variation:

    my $best = ""; $best = $1 while $string =~ /((.)\2*)(*SKIP)(?(?{length $^N <= length +$best})(*FAIL))/g; print "best: $best\n"

      Does that work?

      Win8 Strawberry 5.30.3.1 (64) Tue 09/29/2020 13:32:10 C:\@Work\Perl\monks >perl use strict; use warnings; my $string = 'AABBBBCCC'; my $len = 0; my $best = ""; while ($string =~ /((.)(?:(*SKIP)\2){$len,})/g) { $len = length $1; $best = $1 } print "best: '$best' \n" ^Z best: ''


      Give a man a fish:  <%-{-{-{-<

        Oops, no, it doesn't, but I think it should!

        Is that a bug in perl?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11122306]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (2)
As of 2024-04-20 03:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found