Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: substrings that consist of repeating characters

by LanX (Saint)
on Sep 27, 2020 at 20:06 UTC ( [id://11122270]=note: print w/replies, xml ) Need Help??


in reply to substrings that consist of repeating characters

The simplest way to do it, demonstrated in the debugger

DB<39> $_ = "AAATTTAGTTCTTAAGGCTGACATCGGTTTACGTCAGCGTTACCCCCCAAGTTAT +TGGGGACTTT"; DB<40> push @substr, $1 while /((\w)\2+)/g DB<41> @sorted = sort { length($b) <=> length($a) } @substr DB<42> x @sorted 0 'CCCCCC' 1 'GGGG' 2 'AAA' 3 'TTT' 4 'TTT' 5 'TTT' 6 'TT' 7 'TT' 8 'AA' 9 'GG' 10 'GG' 11 'TT' 12 'AA' 13 'TT' 14 'TT' DB<43>

Storing the length in @substr for a Schwartzian transform might be faster, but I wouldn't bet on this.

IMHO is length only doing a simple lookup of the pre-calculated length inside Perl's data-structure for strings and should be pretty fast.

HTH! :)

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

update
you could also do sort and dump in one line:

DB<43> print join "\n", sort { length($b)<=>length($a) } @substr CCCCCC GGGG AAA TTT TTT TTT TT TT AA GG GG TT AA TT TT DB<44>

Replies are listed 'Best First'.
Re^2: substrings that consist of repeating characters
by BillKSmith (Monsignor) on Sep 28, 2020 at 22:38 UTC
    A slight simplification can be gained by using the 'nsort_by' function from List::UtilsBy (or its XS equivalent). You can also use the special variable '$,' rather than 'join' to control the print.
    use strict; use warnings; use List::UtilsBy::XS qw(nsort_by); my $string = "AAATTTAGTTCTTAAGGCTGACATCGGTTTACGTCAGCGTTACCCCCCAAGTTATT +GGGGACTTT"; my @matches; push @matches, $& while ($string=~m/([AGCT])\1+/g); local $, = "\n"; print nsort_by {length} @matches ;
    Bill

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11122270]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-25 06:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found