Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^4: How to expand a string

by Anonymous Monk
on Nov 29, 2007 at 16:31 UTC ( [id://653891]=note: print w/replies, xml ) Need Help??


in reply to Re^3: How to expand a string
in thread How to expand a string

Thanks, all the replies are helping me get there but it's still a bit of a nightmare to code. Here's some example inputs etc.

R = A or G
S = C or G
K = G or T
All the other letters stay constant.

my $seq1 = "CAG GTR CAG CTG AAG SAG TCA GG";
my $seq2 = "GAK GTG CAG CTT CAG CAG TCR GG";

The gaps between sets of 3 letters aren't important - it just signifies DNA codons.

So, both seq 1 and seq 2 have 4 possible resulting sequences. I need to store the 4 seqs associated with seq 1 separately e.g. in a different array to those of seq 2.

If you have any ideas about the best way to do this I'd be very grateful! Speed isn't a big consideration, as long as it works!

Thanks for your help.

Replies are listed 'Best First'.
Re^5: How to expand a string
by BrowserUk (Patriarch) on Nov 29, 2007 at 16:52 UTC
    #! perl -slw use strict; my @seqs = ( "CAG GTR CAG CTG AAG SAG TCA GG", "GAK GTG CAG CTT CAG CAG TCR GG", ); for my $seq ( @seqs ) { print $seq; $seq =~ tr[ ][_]; $seq =~ s[R][{A,G}]g; $seq =~ s[S][{C,G}]g; $seq =~ s[K][{G,T}]g; tr[_][ ] and print "\t$_" while glob $seq; } __END__ CAG GTR CAG CTG AAG SAG TCA GG CAG GTA CAG CTG AAG CAG TCA GG CAG GTA CAG CTG AAG GAG TCA GG CAG GTG CAG CTG AAG CAG TCA GG CAG GTG CAG CTG AAG GAG TCA GG GAK GTG CAG CTT CAG CAG TCR GG GAG GTG CAG CTT CAG CAG TCA GG GAG GTG CAG CTT CAG CAG TCG GG GAT GTG CAG CTT CAG CAG TCA GG GAT GTG CAG CTT CAG CAG TCG GG

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^5: How to expand a string
by GrandFather (Saint) on Nov 29, 2007 at 20:10 UTC

    A somewhat general solution could look like:

    use strict; use warnings; use Data::Dump::Streamer; my %replace = ( R => '{A,G}', S => '{C,G}', K => '{G,T}', ' ' => '_', ); my %seqs = ( seq1 => {org => "CAG GTR CAG CTG AAG SAG TCA GG"}, seq2 => {org => "GAK GTG CAG CTT CAG CAG TCR GG"} ); for my $seqKey (keys %seqs) { $seqs{$seqKey}{glob} = $seqs{$seqKey}{org}; $seqs{$seqKey}{glob} =~ s/$_/$replace{$_}/g for keys %replace; push @{$seqs{$seqKey}{expanded}}, map {y/_/ /; $_} glob $seqs{$seq +Key}{glob}; } Dump (\%seqs);

    Prints:

    $RO1 = 'seq1'; make_ro($RO1); $HASH1 = { expanded => [ 'CAG GTA CAG CTG AAG CAG TCA GG', 'CAG GTA CAG CTG AAG GAG TCA GG', 'CAG GTG CAG CTG AAG CAG TCA GG', 'CAG GTG CAG CTG AAG GAG TCA GG' ], glob => 'CAG_GT{A,G}_CAG_CTG_AAG_{C,G}AG_TCA_GG', org => 'CAG GTR CAG CTG AAG SAG TCA GG' }; $RO2 = 'seq2'; make_ro($RO2); $HASH2 = { expanded => [ 'GAG GTG CAG CTT CAG CAG TCA GG', 'GAG GTG CAG CTT CAG CAG TCG GG', 'GAT GTG CAG CTT CAG CAG TCA GG', 'GAT GTG CAG CTT CAG CAG TCG GG' ], glob => 'GA{G,T}_GTG_CAG_CTT_CAG_CAG_TC{A,G}_GG', org => 'GAK GTG CAG CTT CAG CAG TCR GG' };

    Note that you well need Perl 5.6.0 or later for the glob to work correctly.


    Perl is environmentally friendly - it saves trees

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://653891]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-18 19:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found