Re: Bioinformatics: Slow Parsing of a Fasta File


Come for the quick hacks, stay for the epiphanies.
	PerlMonks

Re: Bioinformatics: Slow Parsing of a Fasta File

by almut (Canon)

on Jul 27, 2010 at 19:25 UTC ( [id://851593]=note: print w/replies, xml )

Need Help??

in reply to Bioinformatics: Slow Parsing of a Fasta File

I've never used any BioPerl modules myself (so I cannot say anything about their performance in general), but half an hour for reading ~200MB sounds extremely long indeed. That makes me wonder if the file is maybe not stored locally, i.e. you're doing I/O with some remote share...

Could you check if running something functionally similar, but without Bio::SeqIO — in its most simple case

while (<>) {
    print if /^>/;
}
[download]

(or perl -ne "print if /^>/" Test.fasta as a one-liner) takes similarly long?

Comment on Re: Bioinformatics: Slow Parsing of a Fasta File Select or Download Code

Replies are listed 'Best First'.
Re^2: Bioinformatics: Slow Parsing of a Fasta File by Anonymous Monk on Jul 28, 2010 at 06:38 UTC
The code `perl -ne "print if /^>/" test.fasta` works just quite quickly. I am reading this file from the same folder that the program is in... While not wanting to resort to a regex to extract the sequences and their header information I believe this is going to be my only way through!	[reply] [d/l]

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://851593]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others pondering the Monastery: (6)

As of 2024-04-17 15:54 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found