Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Bioinformatics coding question

by charm (Initiate)
on Dec 20, 2016 at 19:23 UTC ( #1178240=perlquestion: print w/replies, xml ) Need Help??

charm has asked for the wisdom of the Perl Monks concerning the following question:

Perlmonks, This is the question I am having trouble answering

Use random DNA sequence generator as many times as you need to get the protein-coding region(s) (nucleotide triplets). Minimum sequence length is 500bp.

Find all possible protein-coding regions in microbes (between start and stop codons, ATG TAG|TGA|TAA).

Using Standard Genetic Code table (Wikipedia or any other sources), create hash table for all amino acids ($amino{TTT} => "F", ...).

Write found protein sequences into file with the next format:
   Position of 1st start codon: protein sequence [length of protein sequence]
   Position of 2nd start codon: .

For example: 45: FLPQWCV [7]

I am sort of clueless and not sure where to start. To find the coding sequence I have generated a 5000bp random nucleotide but everytime i use the code below to find a coding region it returns nothing. Can anyone tell me what i am doing wrong?

@nucs=("A","C","G","T"); $size=5000; for ($i=0; $i<$size; $i++) { $seqR .= $nucs[int(rand(4))]; } print "Seq($size): $seqR\n"; if (/ATG([ACGT][ACGT][ACGT]){3,5000}(TAA|TAG|TGA)/) { print "This seq. might contain a coding region\n" } else{ print "This sequence most liklely does not contin a coding region\ +n" }

Replies are listed 'Best First'.
Re: Bioinformatics coding question
by talexb (Canon) on Dec 20, 2016 at 20:36 UTC

    I echo brother kscwab's comments .. this sounds an awful lot like homework.

    At the very least, break down what kind of pattern you are looking for. The regexp you've listed appears to look for

    • ATG;
    • ACGT repeated three times, which you capture into $1, then expect to repeat from three to five thousand times; and
    • One of TAA, TAG and TGA.
    It's a bit hard to figure out exactly what your code is, because you haven't mastered putting the code between code tags. If you can explain in more complete sentences what you are looking for, I'm sure we can guide you into completing the Perl that you need to get the job done.

    For example, the triplet you are looking for at the end .. the 'T' appears at the beginning of each pattern, so you are really looking for

    • T; followed by
    • AA, AG or GA.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Bioinformatics coding question
by kschwab (Vicar) on Dec 20, 2016 at 20:01 UTC

    This sounds suspiciously like homework...see this faq.

    Edit: Maybe break the problem down a bit, and ask a more focused question, versus just dumping the assignment almost verbatim?

Re: Bioinformatics coding question
by BillKSmith (Prior) on Dec 21, 2016 at 04:45 UTC
    The more urgent your problem, the more important it is that you ask the right question. Properly formed questions always receive timely replies. Even the right answer to the wrong question will not do you much good. You can increase the number and quality of the replies even more if you ask the question in the vocabulary of perl rather than biology. (more of your readers will understand it)
Re: Bioinformatics coding question
by GotToBTru (Prior) on Dec 20, 2016 at 21:23 UTC

    First thing needed: a meaningful title!

    But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

Re: Bioinformatics coding question
by shawnhcorey (Friar) on Dec 21, 2016 at 13:36 UTC

    The triple-argument for loop can be replaced with a more-readable foreach loop.

    for ( 1 .. $size ) { $seqR .= $nucsint(rand(4)); }
Re: Bioinformatics coding question
by Lotus1 (Vicar) on Dec 22, 2016 at 04:16 UTC

    This perlmonks node has an example of using a hash for a codon table.

Re: Bioinformatics coding question
by charm (Initiate) on Dec 21, 2016 at 09:00 UTC
    Thank you all, It was homework but I figured it out

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1178240]
Approved by talexb
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2021-01-20 09:53 GMT
Find Nodes?
    Voting Booth?