Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Orf subsequences

by odegbon (Initiate)
on Dec 04, 2008 at 07:37 UTC ( #727885=perlquestion: print w/replies, xml ) Need Help??

odegbon has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: Orf subsequences
by gone2015 (Deacon) on Dec 04, 2008 at 08:53 UTC

    First, let me say I know nothing about Orf sequences...

    A very quick search of CPAN threw up this bioperl examples/ which might be worth a look ? Again, I don't know whether bioperl is useful... but you might, at least, get some ideas from looking at some of the code there ?

    Looking at the fragment of code you have so far... [and it would be easier to do that if (a) it was enclosed in <code> tags, (b) was runnable, (c) had some sample data with it, (d) a description of what was expected, and (e) almost anything that allowed a humble programmer to understand what was required.] far as I can see, you've collected possible start positions in @startsRF1 and stop positions in @stopsRF1 -- these positions are marked by certain 3 character sequences, which are constrained to appear at three character boundaries. Now you want to process stuff between those start and stop positions. Because of the way they've been collected, those arrays are in ascending order of string position, which is a start. Now:

    • can what you want to process include one or more start and/or end positions ? So, if the starts are: (6, 36, 69) and the ends (42, 57, 90), do you want to look at: (6..42, 6..57, 6..90, 36..42, 36..90, 69..90), or (36..42, 36..57, 69..90), or just (36..42, 69..90) ?

    • do the start and end of the string count as start and end positions ?

    Whatever the answers to the above, the simple approach is two foreach loops, the outer cycling through the start positions and the inner the end positions, deciding which start..end combinations to consider. Inside all that you can extract the substring using substr. Then ... I dunno; I regret I don't know what a protein sequence looks like.

    If you have huge numbers of start and end positions, and depending on the answers to the above, you may want a more cunning approach, to speed things up. What I have suggested above is O(n^2), which is fine for little problems, and (frankly) horrible for big ones. But, never optimise until you have to -- and even then, think twice.

Re: Orf subsequences
by Skeeve (Parson) on Dec 04, 2008 at 08:25 UTC

    Is this sufficient?

    #!/usr/bin/perl use strict; use warnings; my(@codons)= qw(ATG GTG); my $dna = "AAAATGGGGTAAGTGAACGGGTAA"; my $splitter= join('|', @codons); my @sequences= split /($splitter)/,$dna; shift @sequences; my $codon= 1; foreach (@sequences) { if ($codon) { print $_,"-"; } else { print $_,"\n" } $codon= not $codon; }



    it works by splitting at codons, but capturing them. Then discarding the first (possibly empty) ouput of split and putting together every two elements of split's output.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Orf subsequences
by lima1 (Curate) on Dec 04, 2008 at 16:54 UTC
Re: Orf subsequences
by johngg (Canon) on Dec 04, 2008 at 08:20 UTC

    You could do this using a regular expression with two capturing groups, see perlretut and perlre. There are probably lots of modules out there designed for just this sort of thing.

    $ perl -le ' > $seq = q{AAAATGGGGTAAGTGAACGGGTAA}; > $start = q{ATG}; > $stop = q{GTG}; > ( $prot1, $prot2 ) = > $seq =~ m{(${start}[ACGT]*?)(${stop}[ACGT]*)}; > print qq{$prot1\n$prot2\n};' ATGGGGTAA GTGAACGGGTAA $

    I hope this is of use.



A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://727885]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2022-05-26 06:10 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (93 votes). Check out past polls.