Re: Bioperl or ncbi: parsing refseq files


Just another Perl shrine
	PerlMonks

Re: Bioperl or ncbi: parsing refseq files

by BioLion (Curate)

on Jun 14, 2010 at 15:45 UTC ( [id://844644]=note: print w/replies, xml )

Need Help??

in reply to Bioperl or ncbi: parsing refseq files

Hi, your flat genbank files can be handled using the BioPerl suite. See this HowTo for a very detailed guide to IO, which includes genbank parsers (and a lot of others).

You'll be using the Bio::SeqIO module ( See Yes, even you can use CPAN for a guide on getting modules installed, or ignore if i am patronising you... ) to read in the files, test each sequence feature if it is in your region of interest, and if it is, write it out to a fresh (smaller) file. You can do all this on-the-fly, so your large file shouldn't trouble memory problems...

Have a read, have a go, and get back to the Monastery (with examples and code) if you are having problems. Hope this helps!

Update: Typos...

Just a something something...

Comment on Re: Bioperl or ncbi: parsing refseq files

Replies are listed 'Best First'.

Re^2: Bioperl or ncbi: parsing refseq files
by roibrodo (Sexton) on Jun 15, 2010 at 09:30 UTC

Thanks for the reply.

I'm not sure I got it right. Even if the feature is not fully in the region of interest, but only partially in it, I want to "truncate" it and take it. I also want the sequence (that appears after all the features) to be outputted. Basically, I want to do exactly what the "change region shown" does on the online version of NCBI.

I would appreciate a more verbose example, if possible, since this are my first steps with BioPerl.

[reply]

In Section Seekers of Perl Wisdom