Re^2: how to compare two hashes with perl?

Replies are listed 'Best First'.
Re^3: how to compare two hashes with perl? by 7stud (Deacon) on Nov 04, 2009 at 23:22 UTC
I hear hash is random when it prints output That just means that the order in which you add key/value pairs to a hash is not the order in which they are stored in the hash. Here is an example: `use strict; use warnings; $\ = "\n"; $, = ', '; my %hash = (); $hash{"h"} = 10; $hash{"z"} = 20; $hash{"a"} = 30; foreach my $key (keys %hash) { print "$key: $hash{$key}"; } --output:-- a: 30 h: 10 z: 20` [download] However, the key/value pairs are the same. A key will never be associated with a value that you did not enter for that key. it's just confusing and I'm not exactly sure how my file gets stored in hash Take a look at this example: `use strict; use warnings; $\ = "\n"; $, = ', '; my %results = (); my $line = 'HWUSI-EAS548:7:1:5:1527#0/1 + chr12 52084152 CGGAGC'; my @pieces = split /\s+/, $line; my $id = $pieces[0]; my $seq = $pieces[-1]; $results{$id} = $seq; foreach my $key (keys %results) { print "$key -----> $results{$key}"; } --output:-- HWUSI-EAS548:7:1:5:1527#0/1 -----> CGGAGC` [download] If you want to gather all the sequences corresponding to an id, you can do this: `use strict; use warnings; $\ = "\n"; $, = ', '; my %results = (); while (<DATA>) { my @pieces = split /\s+/; my $id = $pieces[0]; my $seq = $pieces[-1]; $results{$id} = [] unless exists $results{$id}; push @{$results{$id}}, $seq; } foreach my $key (keys %results) { my $arr_str = join ',', @{$results{$key}}; print "$key -----> [$arr_str]"; } __DATA__ HWUSI-EAS548:7:1:5:1527#0/1 + chr12 52084152 CGGAGC HWUSI-EAS548:7:1:5:1527#0/1 + chr12 52084152 XXXXXX Some_other_id + chr12 52084152 CGGAGC` [download] You might want to experiment a little more with hashes in a separate practice program. For instance, you might want to read perlintro and perldsc, which you can read by typing: `$ man perlintro or $ man perdsc` [download] For a complete list of topics available type: `$man perl` [download] and scroll down.	[reply] [d/l] [select]
Re^4: how to compare two hashes with perl? by 7stud (Deacon) on Nov 04, 2009 at 23:36 UTC
`while (<DATA>) { my @pieces = split /\s+/; my $id = $pieces[0]; my $seq = $pieces[-1]; $results{$id} = [] unless exists $results{$id}; push @{$results{$id}}, $seq; }` [download] Actually, as perlreftut instructs, the line: `$results{$id} = [] unless exists $results{$id};` [download] is unnecessary. I highly recommend that you read perlreftut: $ man perlreftut	[reply] [d/l] [select]
Re^3: how to compare two hashes with perl? by BioLion (Curate) on Nov 05, 2009 at 00:23 UTC
I take it this is bowtie output? It makes no sense to me why you are comparing all IDs in the first file to all IDs in the second? The whole point of using a hash is that you can look up specific keys, whereas an array would be for storing an ordered list. What are you actually trying to do? Get the common IDs between the files and say whether their associated sequences match? You can try something like this for that : `foreach my $id (keys %hash1){ # you can use (sort keys %hash1) if you +want them in a specified order if ( exists $hash2{$id} ){ print "\'$id\' exists in both hashes.\n"; if ( $hash1{$id} eq $hash2{$id} ){ ## id and sequence are stored as key value pairs print "and the sequences match too.\n"; } else{ print "but the sequences do not match.\n"; } } else { print "\'$id\' only exists in hash1.\n"; } }` [download] If you want help with data strucutes, try perldsc for starters. Just a something something...	[reply] [d/l]
Re^4: how to compare two hashes with perl? by FluffyBunny (Acolyte) on Nov 05, 2009 at 22:50 UTC
Hello BioLion, Basically I followed your code, use warnings; use strict; my %bow1 = (); my $file1 = shift; open (FILE1, "$file1"); # Open first file while (<FILE1>) { my ($ID1, undef, undef, undef, $Seq1) = split; $bow1{$ID1} = $ID1; $bow1{$Seq1} = $Seq1; print STDERR "$bow1{$ID1}\t$bow1{$Seq1}\n"; } close FILE1; my %bow2 = (); my $file2 = shift; open (FILE2, "$file2"); # Open second file while (<FILE2>) { my ($ID2, undef, undef, undef, $Seq2) = split; $bow2{$ID2} = $ID2; $bow2{$Seq2} = $Seq2; print STDERR "$bow2{$ID2}\t$bow2{$Seq2}\n"; } close FILE2; foreach my $ID1 (keys %bow1){ # can use (sort keys %hash) to put items + in a specified order if ( exists $bow2{$ID2} ){ if ( $bow1{$ID1} eq $bow2{$ID2} ){ ## id and sequence are stored as key value pairs print "$bow1{$ID1} exists in $file1 and $file2 and the sequen +ces match $bow1{$Seq1} $bow2{$Seq2} \n"; } else{ print "$bow1{$ID1} exists in $file1 and $file2 but sequences D +O NOT match $bow1{$Seq1} $bow2{$Seq2} \n"; } } else { print "$bow1{$ID1} only exists in $file1 .\n"; } } exit; [download] However I get some errors Global symbol "$ID2" requires explicit package name at /home/choia2/sc +ripts/BowtieCompare.pl line 50. Global symbol "$ID2" requires explicit package name at /home/choia2/sc +ripts/BowtieCompare.pl line 51. Global symbol "$Seq1" requires explicit package name at /home/choia2/s +cripts/BowtieCompare.pl line 53. Global symbol "$Seq2" requires explicit package name at /home/choia2/s +cripts/BowtieCompare.pl line 53. Global symbol "$Seq1" requires explicit package name at /home/choia2/s +cripts/BowtieCompare.pl line 56. Global symbol "$Seq2" requires explicit package name at /home/choia2/s +cripts/BowtieCompare.pl line 56. Execution of /home/choia2/scripts/BowtieCompare.pl aborted due to comp +ilation errors. [download] Basically the foreach loop.. I never used hash for other programming languages (I wasn't professional though) but this hash concept is confusing.. could you help me one more time? >.<	[reply] [d/l] [select]
UPDATE! I fixed it :D by FluffyBunny (Acolyte) on Nov 06, 2009 at 22:11 UTC
Could you please take a look at this and let me know if there's any logical error? I think it works well.. Also if you can give me any suggestions, please do not hesitate to reply / message. Thank you so much!! (*Special thanks to BioLion! You're awesome :D*) #!/usr/bin/perl use warnings; # Perl interpreter command use strict; my %bow1 = (); my $file1 = shift; open (FILE1, "$file1")\|\| die "Failed to open $file1 for reading : $!"; + # Open first file while (<FILE1>) { # Reading first hash my ($ID, undef, undef, undef, $Seq) = split; $bow1{$ID}[0] = $ID; $bow1{$ID}[1] = $Seq; } close FILE1 \|\| die "Failed to close $file1 : $!"; my %bow2 = (); my $file2 = shift; open (FILE2, "$file2") \|\| die "Failed to open $file2 for reading : $!" +; # Open first file while (<FILE2>) { # Reading second hash my ($ID, undef, undef, undef, $Seq) = split; $bow2{$ID}[0] = $ID; $bow2{$ID}[1] = $Seq; } close FILE2 \|\| die "Failed to close $file2 : $!"; print"Match status\t$file1 ID\t$file1 Sequence\t$file2 ID\t$file2 Sequ +ence\n"; # Print title my $totalCount=0; #initialize variables for counting my $identical=0; my $diffSeq=0; my $unique=0; foreach my $ID (keys %bow1){ # can use (sort keys %hash) to put items +in a specified order if (exists $bow2{$ID}[0] ){ if ( $bow1{$ID}[0] eq $bow2{$ID}[0] ){ ## id and sequence are stored as key value pairs if ( $bow1{$ID}[1] eq $bow2{$ID}[1] ){ #print "Identical\t$bow1{$ID}[0]\t$bow1{$ID}[1]\t$bow2{$ID}[0 +]\t$bow2{$ID}[1]\n"; #display ID and sequences -->too many: commente +d out $identical=$identical+1; #count identical pairs } else{ print "SameID, DiffSeq\t$bow1{$ID}[0]\t$bow1{$ID}[1]\t$bow2{$I +D}[0]\t$bow2{$ID}[1]\n"; #display ID and sequences $diffSeq=$diffSeq+1; #count pairs with different sequences but + identical IDs } } } else { print "Unique\t$bow1{$ID}[0]\t$bow1{$ID}[1]\t - \t - \n"; #display + ID and sequences $unique=$unique+1; #count unique IDs from first file } } $totalCount = $identical + $diffSeq + $unique; #total count - should m +atch with total ID in first file print "Identical\tSeq is different\tUnique in $file1\tTotal\n"; #print + title print "$identical\t$diffSeq\t$unique\t$totalCount\n"; #print numbers exit; [download]	[reply] [d/l]
Re: UPDATE! I fixed it :D by BioLion (Curate) on Nov 09, 2009 at 12:08 UTC


P is for Practical
	PerlMonks