comment on

First, let me say I know nothing about Orf sequences...

A very quick search of CPAN threw up this bioperl examples/longorf.pl which might be worth a look ? Again, I don't know whether bioperl is useful... but you might, at least, get some ideas from looking at some of the code there ?

Looking at the fragment of code you have so far... [and it would be easier to do that if (a) it was enclosed in <code> tags, (b) was runnable, (c) had some sample data with it, (d) a description of what was expected, and (e) almost anything that allowed a humble programmer to understand what was required.]

...as far as I can see, you've collected possible start positions in @startsRF1 and stop positions in @stopsRF1 -- these positions are marked by certain 3 character sequences, which are constrained to appear at three character boundaries. Now you want to process stuff between those start and stop positions. Because of the way they've been collected, those arrays are in ascending order of string position, which is a start. Now:

can what you want to process include one or more start and/or end positions ? So, if the starts are: (6, 36, 69) and the ends (42, 57, 90), do you want to look at: (6..42, 6..57, 6..90, 36..42, 36..90, 69..90), or (36..42, 36..57, 69..90), or just (36..42, 69..90) ?
do the start and end of the string count as start and end positions ?

Whatever the answers to the above, the simple approach is two foreach loops, the outer cycling through the start positions and the inner the end positions, deciding which start..end combinations to consider. Inside all that you can extract the substring using substr. Then ... I dunno; I regret I don't know what a protein sequence looks like.

If you have huge numbers of start and end positions, and depending on the answers to the above, you may want a more cunning approach, to speed things up. What I have suggested above is O(n^2), which is fine for little problems, and (frankly) horrible for big ones. But, never optimise until you have to -- and even then, think twice.

In reply to Re: Orf subsequences by gone2015
in thread Orf subsequences by odegbon

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Just another Perl shrine
	PerlMonks