http://qs321.pair.com?node_id=466253

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

dear monks,

i have a very simple problem but can't work it out! I basically have two arrays, one contains a small number of unique id's, the second contains lots of sequences labelled with their id's.

I am simply trying to compare the unique sequence id's to those in the sequence file and extract the corresponding sequence.

I am getting confused trying to loop within two arrays. Please can someone show me where i'm getting it wrong??

# the uniq id's file looks like this: # gi|11995001:156374-156649 dbj|BA000040|:2701685-2702539 dbj|BA000040|:c8987046-8986282 gi|13488050:58289-58570 gi|13470324:5721573-5721854 # the corresponding sequence file looks like this: >gi|11995001:156374-156649, SMa0002 ATGGAGGCTGTTCCCATGAATGTAGACCTCTCACGGCGCAGCTTTTTGAAGCTGGCTGGAGCAGGGGCTG CGGCAACGTCACTCGGTGCGATGGGGTTTGGTGAGGCTGAGGCGGCGGTCGTCGCGCATGTCCGGCCTCA >dbj|BA000040|:2701685-2702539 GAAGGAGCCGATCTGGTCACCTTTTCCGGCGACAAGCTGCTGGGCGGTCCGCAGGCGGGTTTCATCGTCG GGCGCAGGGACCTGATCGCCGA # every unique_id has a corresonding sequence in @sequence # here is my attempt open (GENES, "$ARGV[1]") or die "unable to open file $!\n"; open (IDS, "$ARGV[0]") or die "unable to open file $!\n"; open (GENES, "$ARGV[1]") or die "unable to open file $!\n"; my @ids = <IDS>; my @genes = <GENES>; my $ids = join ('', @ids); @ids = split ('\n', $ids); my $genes = join ('', @genes); @genes = split ('>', $genes); my @accessions; foreach my $line (@file) { if ($line =~ /^(\w+\|\w+\.{0,1}\d{0,1}\|{0,1}:c{0,1}\d+\-\d+)/ +) { push @accessions, "$1"; } } # extract uniq id's my %seen=(); my @uniq = (); foreach my $item (@accessions) { unless ($seen{$item}) { $seen{$item}=1; push (@uniq, $item); } } # dig out the correspnding sequence for each id # THIS BIT NOT WORKING ;-( for (my $i=0; $i<@sequence; $i++) { foreach my $id (@uniq) { if ($sequence[$i] =~ /^$id/) { print "$id\n"; } } }

Replies are listed 'Best First'.
Re: pattern matching and array comparison
by Paladin (Vicar) on Jun 13, 2005 at 20:21 UTC
    Well, first off, you should probably always use strict and warnings. use strict would have found your error for you. Well, 1 of the errors.
    1. In your final for() loop, you are using an array called @sequence which you never initialize anywhere else. I presume you meant to use @gene there instead.
    2. Your $ids have regex meta chars in them (the |), so you need to tell Perl to treat them as normal characters.
    If you change your final part to:
    for (my $i=0; $i<@genes; $i++) { foreach my $id (@uniq) { if ($genes[$i] =~ /^\Q$id/) { print "$id\n"; } } }
    it seems to work.
Re: pattern matching and array comparison
by GrandFather (Saint) on Jun 13, 2005 at 22:28 UTC

    Using a hash is probably a better way to do it. Something like this may be what you want:


    Perl is Huffman encoded by design.
Re: pattern matching and array comparison
by reneeb (Chaplain) on Jun 13, 2005 at 20:18 UTC
    #! /usr/bin/perl use strict; use warnings; my $id_file = '/path/to/file/with/ids.txt'; my $fasta_file = '/path/of/fasta/file.fasta'; open(my $fh, "<$id_file") or die $!; my @ids = <$fh>; close $fh; { local $/ = "\n>"; open(my $fh, "<$fasta_file") or die $!; while(my $entry = <$fh>){ print $entry if(grep{$entry =~ /\Q$_\E/}@ids); } }
Re: pattern matching and array comparison
by Anonymous Monk on Jun 13, 2005 at 18:57 UTC
    sorry - made a typo: should be
    foreach my $line (@ids) { if ($line =~ /^(\w+\|\w+\.{0,1}\d{0,1}\|{0,1}:c{0,1}\d+\-\d+)/ +) { push @accessions, "$1"; }
    }