How can one retrieve the transcript, 5 prime and 3 prime UTR of an Ensembl identifier?

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks,

I am interested in retrieving the transcript, 5 prime and 3 prime UTR of an Ensembl transcript identifier like ENST00000528762. I have searched in the web and have not found any script for auto downloading the desired sequences. Then I have tried with the following script. The script does not work and gives the cmd output indicating some problems in driver. I could not sort out the problem. Is it possible to retrieve the transcript in fasta format along with 5 prime and 3 prime UTR sequences of the transcript using any other perl code? I welcome suggestions from perlmonks.

Here goes the code x3.pl

 #!/usr/bin/perl
use warnings; 
use strict; 

 use Bio::EnsEMBL::Registry;

 my $registry = 'Bio::EnsEMBL::Registry';

 $registry->load_registry_from_db(
    -host => 'useastdb.ensembl.org', # alternatively 'useastdb.ensembl
+.org'
    -user => 'anonymous');  

 my $stable_id = 'ENST00000528762';

 my $transcript_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Tr
+anscript' );
 my $transcript = $transcript_adaptor->fetch_by_stable_id($stable_id);

print "\n Transcript: $transcript\n"; 

# UTR sequences are obtained via the five_prime_utr() and
# three_prime_utr() methods:

 my $fiv_utr = $transcript->five_prime_utr('ENST00000528762');
 my $thr_utr = $transcript->three_prime_utr('ENST00000528762');

print "\n 5' UTR: ", ( defined $fiv_utr ? $fiv_utr->seq() : "None" ), 
+"\n";
print "\n 3' UTR: ", ( defined $thr_utr ? $thr_utr->seq() : "None" ), 
+"\n";

exit;
[download]

I have got the output in cmd as follows:

 C:\Users\x>cd d*

C:\Users\x\Desktop>x3.pl
install_driver(mysql) failed: Can't locate Mysql/Statement.pm in @INC 
+(@INC cont
ains: C:/Perl/site/lib C:/Perl/lib .) at C:/Perl/lib/DBD/mysql.pm line
+ 12.
Compilation failed in require at (eval 12) line 3.
Perhaps a module that DBD::mysql requires hasn't been fully installed
 at C:/Perl/lib/Bio/EnsEMBL/Registry.pm line 1765.

C:\Users\x\Desktop>
[download]

Comment on How can one retrieve the transcript, 5 prime and 3 prime UTR of an Ensembl identifier? Select or Download Code

Replies are listed 'Best First'.
Re: How can one retrieve the transcript, 5 prime and 3 prime UTR of an Ensembl identifier? by erix (Prior) on Jun 12, 2017 at 15:44 UTC
Perlmonks are not biologists, so explain: transcript prime 5 prime and 3 prime UTR Ensembl Or perhaps have a look here: http://rest.ensembl.org/	[reply]
Re^2: How can one retrieve the transcript, 5 prime and 3 prime UTR of an Ensembl identifier? by supriyoch_2008 (Monk) on Jun 12, 2017 at 17:32 UTC
Hi erix, I am sorry for not mentioning the full terms in my post. Transcript is the mature RNA sequence that encodes a protein. Transcript is produced from DNA by the process of transcription. It starts with ATG and usually ends with TAA, TAG or TGA. 5 prime UTR (5' UTR) stands for untranslated region found in primary mRNA or pre-mRNA at one end of DNA strand with a phosphate group. It is a sequence of nucleotides. 3 prime UTR (3' UTR) means a sequence of nucleotides which is untranslated and found at the other end of a transcript with a hydroxyl group (OH). Ensembl is a database of DNA, RNA and protein and freely available at http://www.ensembl.org/index.html With regards,	[reply]
Re: How can one retrieve the transcript, 5 prime and 3 prime UTR of an Ensembl identifier? by Diesel (Novice) on Jun 13, 2017 at 14:53 UTC
Hi, I am a beginner myself so I do not feel confident enough to comment on your code however I think you should look into Bioperl. There is a very simple way to read a GenBank file into perl (basically a parser). Once you have read it into perl there should be another function in Bioperl to easily extract the coding sequences; I do not know about 5 and 3 UTR though. I had to sort a similar issue last week where I had to extract the intergenic regions of a bacteria (but with a different twist, could not use already available methods for a variety of reasons); if I had to extract the CDS I could have easily used Bioperl, but after numerous attempt I just wrote my own Genbank parser and I extracted what I needed to extract them from a Fasta myself. Perl regex are very powerful; but again, for your case, there should be something on Bioperl to extract transcripts. Check this out : http://bioperl.org/howtos/Features_and_Annotations_HOWTO.html	[reply]
Re^2: How can one retrieve the transcript, 5 prime and 3 prime UTR of an Ensembl identifier? by supriyoch_2008 (Monk) on Mar 24, 2018 at 06:47 UTC
Hi Diesel, Thank you very much for your suggestions.	[reply]


Think about Loose Coupling
	PerlMonks