Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^2: how to compare two hashes with perl?

by FluffyBunny (Acolyte)
on Nov 04, 2009 at 21:01 UTC ( [id://805060]=note: print w/replies, xml ) Need Help??


in reply to Re: how to compare two hashes with perl?
in thread how to compare two hashes with perl?

Thank you for your reply.

IDs might not be in the same order that's why I'm looking for a certain ID I have in file 1 to match with any ID in file 2...

This is what I wanted to check basically.

1)Check ID names.

2)If they match, and the sequences match, do not print.

3)If they match, but the sequences do not match, print both ID and the sequences from each file.

4)If they dont match, print both ID and the sequences from each file.

I'm a newbie, and I'm trying to understand hash.. it's just confusing and I'm not exactly sure how my file gets stored in hash. I hear hash is random when it prints output and I want my ID doesn't get mixed with wrong sequences (an ID uniquely corresponds to each sequence).

I updated the original post with my output and input files.

Thank you!
  • Comment on Re^2: how to compare two hashes with perl?

Replies are listed 'Best First'.
Re^3: how to compare two hashes with perl?
by 7stud (Deacon) on Nov 04, 2009 at 23:22 UTC
    I hear hash is random when it prints output

    That just means that the order in which you add key/value pairs to a hash is not the order in which they are stored in the hash. Here is an example:

    use strict; use warnings; $\ = "\n"; $, = ', '; my %hash = (); $hash{"h"} = 10; $hash{"z"} = 20; $hash{"a"} = 30; foreach my $key (keys %hash) { print "$key: $hash{$key}"; } --output:-- a: 30 h: 10 z: 20

    However, the key/value pairs are the same. A key will never be associated with a value that you did not enter for that key.

    it's just confusing and I'm not exactly sure how my file gets stored in hash

    Take a look at this example:

    use strict; use warnings; $\ = "\n"; $, = ', '; my %results = (); my $line = 'HWUSI-EAS548:7:1:5:1527#0/1 + chr12 52084152 CGGAGC'; my @pieces = split /\s+/, $line; my $id = $pieces[0]; my $seq = $pieces[-1]; $results{$id} = $seq; foreach my $key (keys %results) { print "$key -----> $results{$key}"; } --output:-- HWUSI-EAS548:7:1:5:1527#0/1 -----> CGGAGC

    If you want to gather all the sequences corresponding to an id, you can do this:

    use strict; use warnings; $\ = "\n"; $, = ', '; my %results = (); while (<DATA>) { my @pieces = split /\s+/; my $id = $pieces[0]; my $seq = $pieces[-1]; $results{$id} = [] unless exists $results{$id}; push @{$results{$id}}, $seq; } foreach my $key (keys %results) { my $arr_str = join ',', @{$results{$key}}; print "$key -----> [$arr_str]"; } __DATA__ HWUSI-EAS548:7:1:5:1527#0/1 + chr12 52084152 CGGAGC HWUSI-EAS548:7:1:5:1527#0/1 + chr12 52084152 XXXXXX Some_other_id + chr12 52084152 CGGAGC

    You might want to experiment a little more with hashes in a separate practice program. For instance, you might want to read perlintro and perldsc, which you can read by typing:

    $ man perlintro or $ man perdsc

    For a complete list of topics available type:

    $man perl

    and scroll down.

      while (<DATA>) { my @pieces = split /\s+/; my $id = $pieces[0]; my $seq = $pieces[-1]; $results{$id} = [] unless exists $results{$id}; push @{$results{$id}}, $seq; }

      Actually, as perlreftut instructs, the line:

      $results{$id} = [] unless exists $results{$id};

      is unnecessary. I highly recommend that you read perlreftut:

      $ man perlreftut

Re^3: how to compare two hashes with perl?
by BioLion (Curate) on Nov 05, 2009 at 00:23 UTC

    I take it this is bowtie output? It makes no sense to me why you are comparing all IDs in the first file to all IDs in the second? The whole point of using a hash is that you can look up specific keys, whereas an array would be for storing an ordered list.

    What are you actually trying to do? Get the common IDs between the files and say whether their associated sequences match? You can try something like this for that :

    foreach my $id (keys %hash1){ # you can use (sort keys %hash1) if you +want them in a specified order if ( exists $hash2{$id} ){ print "\'$id\' exists in both hashes.\n"; if ( $hash1{$id} eq $hash2{$id} ){ ## id and sequence are stored as key value pairs print "and the sequences match too.\n"; } else{ print "but the sequences do not match.\n"; } } else { print "\'$id\' only exists in hash1.\n"; } }

    If you want help with data strucutes, try perldsc for starters.

    Just a something something...

      Hello BioLion,

      Basically I followed your code,
      use warnings; use strict; my %bow1 = (); my $file1 = shift; open (FILE1, "$file1"); # Open first file while (<FILE1>) { my ($ID1, undef, undef, undef, $Seq1) = split; $bow1{$ID1} = $ID1; $bow1{$Seq1} = $Seq1; print STDERR "$bow1{$ID1}\t$bow1{$Seq1}\n"; } close FILE1; my %bow2 = (); my $file2 = shift; open (FILE2, "$file2"); # Open second file while (<FILE2>) { my ($ID2, undef, undef, undef, $Seq2) = split; $bow2{$ID2} = $ID2; $bow2{$Seq2} = $Seq2; print STDERR "$bow2{$ID2}\t$bow2{$Seq2}\n"; } close FILE2; foreach my $ID1 (keys %bow1){ # can use (sort keys %hash) to put items + in a specified order if ( exists $bow2{$ID2} ){ if ( $bow1{$ID1} eq $bow2{$ID2} ){ ## id and sequence are stored as key value pairs print "$bow1{$ID1} exists in $file1 and $file2 and the sequen +ces match $bow1{$Seq1} $bow2{$Seq2} \n"; } else{ print "$bow1{$ID1} exists in $file1 and $file2 but sequences D +O NOT match $bow1{$Seq1} $bow2{$Seq2} \n"; } } else { print "$bow1{$ID1} only exists in $file1 .\n"; } } exit;
      However I get some errors
      Global symbol "$ID2" requires explicit package name at /home/choia2/sc +ripts/BowtieCompare.pl line 50. Global symbol "$ID2" requires explicit package name at /home/choia2/sc +ripts/BowtieCompare.pl line 51. Global symbol "$Seq1" requires explicit package name at /home/choia2/s +cripts/BowtieCompare.pl line 53. Global symbol "$Seq2" requires explicit package name at /home/choia2/s +cripts/BowtieCompare.pl line 53. Global symbol "$Seq1" requires explicit package name at /home/choia2/s +cripts/BowtieCompare.pl line 56. Global symbol "$Seq2" requires explicit package name at /home/choia2/s +cripts/BowtieCompare.pl line 56. Execution of /home/choia2/scripts/BowtieCompare.pl aborted due to comp +ilation errors.

      Basically the foreach loop.. I never used hash for other programming languages (I wasn't professional though) but this hash concept is confusing.. could you help me one more time? >.<

        Could you please take a look at this and let me know if there's any logical error? I think it works well.. Also if you can give me any suggestions, please do not hesitate to reply / message. Thank you so much!! (***Special thanks to BioLion! You're awesome :D***)
        #!/usr/bin/perl use warnings; # Perl interpreter command use strict; my %bow1 = (); my $file1 = shift; open (FILE1, "$file1")|| die "Failed to open $file1 for reading : $!"; + # Open first file while (<FILE1>) { # Reading first hash my ($ID, undef, undef, undef, $Seq) = split; $bow1{$ID}[0] = $ID; $bow1{$ID}[1] = $Seq; } close FILE1 || die "Failed to close $file1 : $!"; my %bow2 = (); my $file2 = shift; open (FILE2, "$file2") || die "Failed to open $file2 for reading : $!" +; # Open first file while (<FILE2>) { # Reading second hash my ($ID, undef, undef, undef, $Seq) = split; $bow2{$ID}[0] = $ID; $bow2{$ID}[1] = $Seq; } close FILE2 || die "Failed to close $file2 : $!"; print"Match status\t$file1 ID\t$file1 Sequence\t$file2 ID\t$file2 Sequ +ence\n"; # Print title my $totalCount=0; #initialize variables for counting my $identical=0; my $diffSeq=0; my $unique=0; foreach my $ID (keys %bow1){ # can use (sort keys %hash) to put items +in a specified order if (exists $bow2{$ID}[0] ){ if ( $bow1{$ID}[0] eq $bow2{$ID}[0] ){ ## id and sequence are stored as key value pairs if ( $bow1{$ID}[1] eq $bow2{$ID}[1] ){ #print "Identical\t$bow1{$ID}[0]\t$bow1{$ID}[1]\t$bow2{$ID}[0 +]\t$bow2{$ID}[1]\n"; #display ID and sequences -->too many: commente +d out $identical=$identical+1; #count identical pairs } else{ print "SameID, DiffSeq\t$bow1{$ID}[0]\t$bow1{$ID}[1]\t$bow2{$I +D}[0]\t$bow2{$ID}[1]\n"; #display ID and sequences $diffSeq=$diffSeq+1; #count pairs with different sequences but + identical IDs } } } else { print "Unique\t$bow1{$ID}[0]\t$bow1{$ID}[1]\t - \t - \n"; #display + ID and sequences $unique=$unique+1; #count unique IDs from first file } } $totalCount = $identical + $diffSeq + $unique; #total count - should m +atch with total ID in first file print "Identical\tSeq is different\tUnique in $file1\tTotal\n"; #print + title print "$identical\t$diffSeq\t$unique\t$totalCount\n"; #print numbers exit;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://805060]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-24 02:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found