http://qs321.pair.com?node_id=466253

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

dear monks,

i have a very simple problem but can't work it out! I basically have two arrays, one contains a small number of unique id's, the second contains lots of sequences labelled with their id's.

I am simply trying to compare the unique sequence id's to those in the sequence file and extract the corresponding sequence.

I am getting confused trying to loop within two arrays. Please can someone show me where i'm getting it wrong??

# the uniq id's file looks like this: # gi|11995001:156374-156649 dbj|BA000040|:2701685-2702539 dbj|BA000040|:c8987046-8986282 gi|13488050:58289-58570 gi|13470324:5721573-5721854 # the corresponding sequence file looks like this: >gi|11995001:156374-156649, SMa0002 ATGGAGGCTGTTCCCATGAATGTAGACCTCTCACGGCGCAGCTTTTTGAAGCTGGCTGGAGCAGGGGCTG CGGCAACGTCACTCGGTGCGATGGGGTTTGGTGAGGCTGAGGCGGCGGTCGTCGCGCATGTCCGGCCTCA >dbj|BA000040|:2701685-2702539 GAAGGAGCCGATCTGGTCACCTTTTCCGGCGACAAGCTGCTGGGCGGTCCGCAGGCGGGTTTCATCGTCG GGCGCAGGGACCTGATCGCCGA # every unique_id has a corresonding sequence in @sequence # here is my attempt open (GENES, "$ARGV[1]") or die "unable to open file $!\n"; open (IDS, "$ARGV[0]") or die "unable to open file $!\n"; open (GENES, "$ARGV[1]") or die "unable to open file $!\n"; my @ids = <IDS>; my @genes = <GENES>; my $ids = join ('', @ids); @ids = split ('\n', $ids); my $genes = join ('', @genes); @genes = split ('>', $genes); my @accessions; foreach my $line (@file) { if ($line =~ /^(\w+\|\w+\.{0,1}\d{0,1}\|{0,1}:c{0,1}\d+\-\d+)/ +) { push @accessions, "$1"; } } # extract uniq id's my %seen=(); my @uniq = (); foreach my $item (@accessions) { unless ($seen{$item}) { $seen{$item}=1; push (@uniq, $item); } } # dig out the correspnding sequence for each id # THIS BIT NOT WORKING ;-( for (my $i=0; $i<@sequence; $i++) { foreach my $id (@uniq) { if ($sequence[$i] =~ /^$id/) { print "$id\n"; } } }