Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

How can one retrieve the transcript, 5 prime and 3 prime UTR of an Ensembl identifier?

by supriyoch_2008 (Monk)
on Jun 12, 2017 at 15:32 UTC ( [id://1192607]=perlquestion: print w/replies, xml ) Need Help??

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks,

I am interested in retrieving the transcript, 5 prime and 3 prime UTR of an Ensembl transcript identifier like ENST00000528762. I have searched in the web and have not found any script for auto downloading the desired sequences. Then I have tried with the following script. The script does not work and gives the cmd output indicating some problems in driver. I could not sort out the problem. Is it possible to retrieve the transcript in fasta format along with 5 prime and 3 prime UTR sequences of the transcript using any other perl code? I welcome suggestions from perlmonks.

Here goes the code x3.pl

#!/usr/bin/perl use warnings; use strict; use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'useastdb.ensembl.org', # alternatively 'useastdb.ensembl +.org' -user => 'anonymous'); my $stable_id = 'ENST00000528762'; my $transcript_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Tr +anscript' ); my $transcript = $transcript_adaptor->fetch_by_stable_id($stable_id); print "\n Transcript: $transcript\n"; # UTR sequences are obtained via the five_prime_utr() and # three_prime_utr() methods: my $fiv_utr = $transcript->five_prime_utr('ENST00000528762'); my $thr_utr = $transcript->three_prime_utr('ENST00000528762'); print "\n 5' UTR: ", ( defined $fiv_utr ? $fiv_utr->seq() : "None" ), +"\n"; print "\n 3' UTR: ", ( defined $thr_utr ? $thr_utr->seq() : "None" ), +"\n"; exit;

I have got the output in cmd as follows:

C:\Users\x>cd d* C:\Users\x\Desktop>x3.pl install_driver(mysql) failed: Can't locate Mysql/Statement.pm in @INC +(@INC cont ains: C:/Perl/site/lib C:/Perl/lib .) at C:/Perl/lib/DBD/mysql.pm line + 12. Compilation failed in require at (eval 12) line 3. Perhaps a module that DBD::mysql requires hasn't been fully installed at C:/Perl/lib/Bio/EnsEMBL/Registry.pm line 1765. C:\Users\x\Desktop>
  • Comment on How can one retrieve the transcript, 5 prime and 3 prime UTR of an Ensembl identifier?
  • Select or Download Code

Replies are listed 'Best First'.
Re: How can one retrieve the transcript, 5 prime and 3 prime UTR of an Ensembl identifier?
by erix (Prior) on Jun 12, 2017 at 15:44 UTC

    Perlmonks are not biologists, so explain:

    transcript

    prime

    5 prime and 3 prime

    UTR

    Ensembl

    Or perhaps have a look here: http://rest.ensembl.org/

      Hi erix,

      I am sorry for not mentioning the full terms in my post.

      Transcript is the mature RNA sequence that encodes a protein. Transcript is produced from DNA by the process of transcription. It starts with ATG and usually ends with TAA, TAG or TGA.

      5 prime UTR (5' UTR) stands for untranslated region found in primary mRNA or pre-mRNA at one end of DNA strand with a phosphate group. It is a sequence of nucleotides.

      3 prime UTR (3' UTR) means a sequence of nucleotides which is untranslated and found at the other end of a transcript with a hydroxyl group (OH).

      Ensembl is a database of DNA, RNA and protein and freely available at http://www.ensembl.org/index.html

      With regards,

Re: How can one retrieve the transcript, 5 prime and 3 prime UTR of an Ensembl identifier?
by Diesel (Novice) on Jun 13, 2017 at 14:53 UTC
    Hi, I am a beginner myself so I do not feel confident enough to comment on your code however I think you should look into Bioperl. There is a very simple way to read a GenBank file into perl (basically a parser). Once you have read it into perl there should be another function in Bioperl to easily extract the coding sequences; I do not know about 5 and 3 UTR though. I had to sort a similar issue last week where I had to extract the intergenic regions of a bacteria (but with a different twist, could not use already available methods for a variety of reasons); if I had to extract the CDS I could have easily used Bioperl, but after numerous attempt I just wrote my own Genbank parser and I extracted what I needed to extract them from a Fasta myself. Perl regex are very powerful; but again, for your case, there should be something on Bioperl to extract transcripts. Check this out : http://bioperl.org/howtos/Features_and_Annotations_HOWTO.html

      Hi Diesel,

      Thank you very much for your suggestions.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1192607]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (2)
As of 2024-04-25 20:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found