http://qs321.pair.com?node_id=913681


in reply to renaming 1000's of FASTA files

Is there anything sensible I can do to speed this up?

Stop reading the same file over and over again. Maybe something like this can help you (untested):

#!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Data::Dumper; my %seq_id; open HEADER , "<FASTA.headers" or die $!; while (<HEADER>){ chomp $_; my $fasta_id = $_; $fasta_id =~ s/_.*//g ; $seq_id{$fasta_id} => $_; } my $infile = $ARGV[0] || die ("Please give me an input fasta file\n"); my $inseq = new Bio::SeqIO(-format => 'fasta', -file => $infile); while (my $seq_obj = $inseq->next_seq ) { my $id = $seq_obj->id ; chomp $id; my $seq = $seq_obj->seq ; if (exists ($seq_id{$id})) { print ">"; print $seq_id{$fasta_id}; print "\n".$seq."\n"; } }

Replies are listed 'Best First'.
Re^2: renaming 1000's of FASTA files
by Cristoforo (Curate) on Jul 11, 2011 at 17:03 UTC
    Shouldn't this code:

    if (exists ($seq_id{$id})) { print ">"; print $seq_id{$fasta_id}; print "\n".$seq."\n"; }

    be

    if (exists ($seq_id{$id})) { print ">"; print $seq_id{$id}; print "\n".$seq."\n"; }
    and

    $seq_id{$fasta_id} => $_; be $seq_id{$fasta_id} = $_;

    Update: Corrected print $seq_id{$id}; from print $id