in reply to Debugging Bioperl warnings for Genebank files that are missing info
Yes at times the genbank files can be problematic in that they are incomplete or that BioPerl gets cranky, you have not provided a code that I can test but if you may consider the following workaround, work with the fasta files in conjunction with the feature table provided in the genbank files
- convert the genbank to gff through (genbank2gff3.pl)
- convert the genbank files to fasta or download the fasta equivalent
- parse the gff files and extract the CDs with their coordinate information
perl -F'\t' -lane 'if($F[2] eq "CDS"){print}' GCA_000153565.1_ASM15356v1_genomic.gbff.gff | cut -f3,4,5,7 > GCA_000153565.1_ASM15356v1_genomic.coordinates.txt
- extract the subsequences from the fasta files using the coordinates saved in GCA_000153565.1_ASM15356v1_genomic.coordinates.txt
For the last item you may use BioPerl::SeqIO $seq->subseq(start..stop) but make sure you get the reverse translation of the seqs in the negative strand
A 4 year old monk
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Debugging Bioperl warnings for Genebank files that are missing info
by Sosi (Sexton) on Oct 24, 2014 at 14:27 UTC | |
by erix (Prior) on Oct 25, 2014 at 14:33 UTC | |
by Sosi (Sexton) on Oct 27, 2014 at 10:30 UTC | |
by Anonymous Monk on Oct 25, 2014 at 13:43 UTC |
In Section
Seekers of Perl Wisdom