in reply to Debugging Bioperl warnings for Genebank files that are missing info
Yes at times the genbank files can be problematic in that they are incomplete or that BioPerl gets cranky, you have not provided a code that I can test but if you may consider the following workaround, work with the fasta files in conjunction with the feature table provided in the genbank files
- convert the genbank to gff through (genbank2gff3.pl)
- convert the genbank files to fasta or download the fasta equivalent
- parse the gff files and extract the CDs with their coordinate information
perl -F'\t' -lane 'if($F[2] eq "CDS"){print}' GCA_000153565.1_ASM15356v1_genomic.gbff.gff | cut -f3,4,5,7 > GCA_000153565.1_ASM15356v1_genomic.coordinates.txt
- extract the subsequences from the fasta files using the coordinates saved in GCA_000153565.1_ASM15356v1_genomic.coordinates.txt
For the last item you may use BioPerl::SeqIO $seq->subseq(start..stop) but make sure you get the reverse translation of the seqs in the negative strand
A 4 year old monk
In Section
Seekers of Perl Wisdom