Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

extract only uppercases from string

by losher (Initiate)
on Aug 23, 2012 at 10:12 UTC ( [id://989259]=perlquestion: print w/replies, xml ) Need Help??

losher has asked for the wisdom of the Perl Monks concerning the following question:

hi, I have a string: G_I1E2_GTGGAG^303CTGgacgCCCTGC_I2E3_TACA

I'd like to obtain two strings: GGTGGAGCTG and CCCTGCTACA
(they are strigns splitted by lowrecases and all characters beside ACTG are ignored.
Any idea how to use a regex for it?
Thanks in advance

Replies are listed 'Best First'.
Re: extract only uppercases from string
by BrowserUk (Patriarch) on Aug 23, 2012 at 11:11 UTC

    First remove non-ACGTacgt, then split on acgt:

    $s = 'G_I1E2_GTGGAG^303CTGgacgCCCTGC_I2E3_TACA';; $s =~ tr[ACGTacgt][]cd;; print for split '[acgt]+', $s;; GGTGGAGCTG CCCTGCTACA

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re: extract only uppercases from string
by grondilu (Friar) on Aug 23, 2012 at 10:41 UTC

    I'm not sure it's easy to do it in just one regex.

    But you can just use two steps:

    my $seq = "G_I1E2_GTGGAG^303CTGgacgCCCTGC_I2E3_TACA"; $seq =~ s/[^ACTGactg]//g; my @result = $seq =~ /[ACTG]+/g; say for @result;

    You can do it in one line if your version of perl is recent enough to support the r regex modifier:

    perl -wE 'say for "G_I1E2_GTGGAG^303CTGgacgCCCTGC_I2E3_TACA" =~ s/[^AC +TGactg]//gr =~ /[ACTG]+/g;'
      Little modification of your idea works for me!
      Thanks a lot!
Re: extract only uppercases from string
by johngg (Canon) on Aug 23, 2012 at 11:11 UTC
    $ perl -E ' > $str = q{G_I1E2_GTGGAG^303CTGgacgCCCTGC_I2E3_TACA}; > say for > map { tr{[ACGT]}{}cd; $_ } > split m{[a-z]+}, $str;' GGTGGAGCTG CCCTGCTACA $

    Cheers,

    JohnGG

Re: extract only uppercases from string
by fisher (Priest) on Aug 23, 2012 at 10:18 UTC
    Well, what did you try?
      ie. for the left string I have tried trick like:
      if ($string =~ m/([ACTG]+).*[actg]/) {
      print $1
      }
      and several other variations but all I get is an uppercase string till the first character other than ACTG.
        #!/usr/bin/perl -- use strict; use warnings; use Data::Dump; $_ = 'G_I1E2_GTGGAG^303CTGgacgCCCTGC_I2E3_TACA'; my @matches = ""; while( m{ ([ACTG]+)# $1 is upper | ([actg]+) # $2 is lower }gxsm ){ if( $1 ){ $matches[-1] .= $1; } if( $2 ){ push @matches, ""; } } dd \@matches; __END__ ["GGTGGAGCTG", "CCCTGCTACA"]
Re: extract only uppercases from string
by aitap (Curate) on Aug 23, 2012 at 10:31 UTC
Re: extract only uppercases from string
by cheekuperl (Monk) on Aug 23, 2012 at 10:36 UTC
    Replace all except A,C,T,G in the string with nothing.
    Check perlre for substituition.
Re: extract only uppercases from string
by Kenosis (Priest) on Aug 23, 2012 at 18:19 UTC

    Here's another option:

    use Modern::Perl; my $str = 'G_I1E2_GTGGAG^303CTGgacgCCCTGC_I2E3_TACA'; say for map s/[^ACGT]//gr, split 'gacg', $str;

    Output:

    GGTGGAGCTG CCCTGCTACA

    The following will produce the same printed output, but it's actually a single string with an embedded \n separating the two segments:

    say $str =~ s/gacg/\n/r =~ /[ACGT\n]/g;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://989259]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-04-20 05:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found