Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: renaming 1000's of FASTA files

by moritz (Cardinal)
on Jul 11, 2011 at 11:43 UTC ( #913681=note: print w/replies, xml ) Need Help??


in reply to renaming 1000's of FASTA files

Is there anything sensible I can do to speed this up?

Stop reading the same file over and over again. Maybe something like this can help you (untested):

#!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Data::Dumper; my %seq_id; open HEADER , "<FASTA.headers" or die $!; while (<HEADER>){ chomp $_; my $fasta_id = $_; $fasta_id =~ s/_.*//g ; $seq_id{$fasta_id} => $_; } my $infile = $ARGV[0] || die ("Please give me an input fasta file\n"); my $inseq = new Bio::SeqIO(-format => 'fasta', -file => $infile); while (my $seq_obj = $inseq->next_seq ) { my $id = $seq_obj->id ; chomp $id; my $seq = $seq_obj->seq ; if (exists ($seq_id{$id})) { print ">"; print $seq_id{$fasta_id}; print "\n".$seq."\n"; } }

Replies are listed 'Best First'.
Re^2: renaming 1000's of FASTA files
by Cristoforo (Curate) on Jul 11, 2011 at 17:03 UTC
    Shouldn't this code:

    if (exists ($seq_id{$id})) { print ">"; print $seq_id{$fasta_id}; print "\n".$seq."\n"; }

    be

    if (exists ($seq_id{$id})) { print ">"; print $seq_id{$id}; print "\n".$seq."\n"; }
    and

    $seq_id{$fasta_id} => $_; be $seq_id{$fasta_id} = $_;

    Update: Corrected print $seq_id{$id}; from print $id

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://913681]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2022-08-10 16:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?