Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

processing key value pairs of a hash

by lomSpace (Scribe)
on Apr 14, 2009 at 20:03 UTC ( [id://757478]=perlquestion: print w/replies, xml ) Need Help??

lomSpace has asked for the wisdom of the Perl Monks concerning the following question:

Hello!
I have a file that I have split and put into a hash and sorted based on the keys. Example:

Eif2b2 fail Eif2b2 pass5 Eif2b2 fail Eif2b2 pass2 Eif2b2 fail Eif2b2 pass4 Eif2b2 fail 49334 fail 49334 fail 49334 pass1 49334 fail 49334 pass4 Oxct1 pass4 Oxct1 fail Oxct1 pass4 Oxct1 fail
For each key name/id, I want to do the following:
1. Count keys with no values above pass4 (pass1 to 3 are fine, 4 to fail are bad).
2. Count the keys which after excluding pass4 to fail have only one key value pair.

Using the above example, the key 'Eif2b2' would have only 1 value below pass4,
that value would be 'pass2', and the key '49334' would have 1 value below 'pass4', 'pass1'.
The total for keys with no values above pass4 would be two.
The next count would be two also, because after each value, for the above mentioned keys, there would be one value remaining after removing everything above 'pass4'. Here is my code:
#!/usr/bin/perl -w use strict; use warnings; open( my $in, "c:/Documents and Settings/mydir/Desktop/current/mart_ex +port.txt" ); open( my $out, ">c:/Documents and Settings/mydir/Desktop/current/mart_ +export_counts.txt" ); my $first_line = <$in>; chomp $first_line; my %clone_hash; while (<$in>) { chomp; my @fields = split /\t/; my ($gene_symbol, $esc_qc, $qc_id) = ($fields[1], $fields[8], $fie +lds[9]); #my $count = 0; #my $clone1 = 0; $clone_hash{$gene_symbol} = $esc_qc; } # sort the oligos by gene_symbol my @sorted_keys = sort { $a <=> $b || $b cmp $a } keys %clone_hash; foreach my $key (@sorted_keys) { print $out "$key = $clone_hash{$key}\n"; } close ($in); close ($out);
I would appreciate the monks direction.

Replies are listed 'Best First'.
Re: processing key value pairs of a hash
by GrandFather (Saint) on Apr 14, 2009 at 21:09 UTC

    The code sample you provide doesn't match the sample data you provide and you don't show the expected output for the sample data so the following is rather a guess:

    use strict; use warnings; my %clone_hash; while (<DATA>) { chomp; next if ! length; my ($gene_symbol, $esc_qc) = split /\s+/; $clone_hash{$gene_symbol}{$esc_qc}++; } my $okGenes = 0; my $singleOks = 0; foreach my $key (sort keys %clone_hash) { my $gene = $clone_hash{$key}; my $okPasses = grep {defined} @{{%$gene}}{qw(pass1 pass2 pass3)}; my $badPasses = keys (%$gene) - $okPasses; if ($badPasses) { ++$singleOks if $okPasses == 1; } else { ++$okGenes; } } print "Ok genes: $okGenes\n"; print "Single pass genes: $singleOks\n"; __DATA__ Eif2b2 fail Eif2b2 pass5 Eif2b2 fail Eif2b2 pass2 Eif2b2 fail Eif2b2 pass4 Eif2b2 fail 49334 fail 49334 fail 49334 pass1 49334 fail 49334 pass4 Oxct1 pass4 Oxct1 fail Oxct1 pass4 Oxct1 fail

    Prints:

    Ok genes: 0 Single pass genes: 2

    True laziness is hard work
      Hi GrandFather!
      I actually split a file and used the fields to populate the hash
      that I declared outside the while loop. My output is
      "gene name \t #good clones "

      Thanks for the direction!
      LomSpace
Re: processing key value pairs of a hash
by toolic (Bishop) on Apr 14, 2009 at 20:44 UTC
    You only show 2 columns of input data, but your code seems to expect 10 columns ($fields[9]). If I just assume 2 columns (what choice do I have?), you could stuff your input into an Hash-of-Hashes data structure as follows:
    use strict; use warnings; use Data::Dumper; my %clone_hash; while (<DATA>) { my ($key, $value) = split; my $grade = ($value =~ /^pass[1-3]$/) ? 'good' : 'bad'; $clone_hash{$key}{$grade}++; } print Dumper(\%clone_hash); __DATA__ Eif2b2 fail Eif2b2 pass5 Eif2b2 fail Eif2b2 pass2 Eif2b2 fail Eif2b2 pass4 Eif2b2 fail 49334 fail 49334 fail 49334 pass1 49334 fail 49334 pass4 Oxct1 pass4 Oxct1 fail Oxct1 pass4 Oxct1 fail

    which prints out (unsorted):

    $VAR1 = { 'Eif2b2' => { 'good' => 1, 'bad' => 6 }, '49334' => { 'good' => 1, 'bad' => 4 }, 'Oxct1' => { 'bad' => 4 } };
      Toolic
      I created an empty hash, then read in a file, split it and used
      fields1 and fields4 to create key value pairs. I then sorted the list by keys.
      I want the following type of output:
      Clone Good Single Clone
      Eif2b2 1 1

      Thanks the hash of hashes answers the first part, but how do
      I print the value from the 'good' key in the hash of hashes?

      Thanks
      LomSpace
        but how do I print the value from the 'good' key in the hash of hashes?
        Loop through the keys of the primary hash, and print out the 'good' values if 'good' keys exist:
        use strict; use warnings; use Data::Dumper; my %clone_hash; while (<DATA>) { my ($key, $value) = split; my $grade = ($value =~ /^pass[1-3]$/) ? 'good' : 'bad'; $clone_hash{$key}{$grade}++; } #print Dumper(\%clone_hash); # Print out the 'good' keys for (keys %clone_hash) { print "$_ => $clone_hash{$_}{good}\n" if exists $clone_hash{$_}{go +od}; } __DATA__ Eif2b2 fail Eif2b2 pass5 Eif2b2 fail Eif2b2 pass2 Eif2b2 fail Eif2b2 pass4 Eif2b2 fail 49334 fail 49334 fail 49334 pass1 49334 fail 49334 pass4 Oxct1 pass4 Oxct1 fail Oxct1 pass4 Oxct1 fail

        which prints:

        Eif2b2 => 1 49334 => 1
      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %clone_hash; while (<DATA>) { chomp; my @fields = split /\t/; my ($gene_symbol, $esc_qc) = ($fields[0], $fields[1]); $clone_hash{$gene_symbol} = $esc_qc; my $grade = ($esc_qc =~ /^pass[1-3]$/) ? 'good' : 'bad'; $clone_hash{$key}{$grade}++; } print Dumper(\%clone_hash); _DATA_ 4933402D24Rik fail 4933402D24Rik fail 4933402D24Rik fail 4933402D24Rik fail 4933402D24Rik fail 4933402D24Rik fail 4933402D24Rik fail 4933402D24Rik pass1
      When I run this I get an error message stating that a
      bareword is found where an operator is expected near
      "4933402D24Rik"(missing operator before "D24Rik"),
      what does this mean?
        I realized that I did not use two underscores before and after DATA.
Re: processing key value pairs of a hash
by kwaping (Priest) on Apr 14, 2009 at 21:18 UTC
    Here's my answer, building on the code you have already posted to create %clone_hash.
    # build the data my %count; while (my ($test,$result) = each %clone_hash) { $count{$test}{passed}++ unless ($result =~ /pass4|fail/); } # check the data foreach my $test (keys %count) { if (! $count{$test}{passed}) { print "test $test had no passing results" . $/; } elsif ($count{$test}{passed} == 1) { print "test $test had only one passing result" . $/; } }
    I am ignoring the minor issues with your code that others have pointed out.

    ---
    It's all fine and dandy until someone has to look at the code.
      Kwaping
      Thanks for the direction!
      LomSpace
Re: processing key value pairs of a hash
by morgon (Priest) on Apr 14, 2009 at 20:42 UTC
    Hi,

    First of all: You write good code.

    Now on to the critisism (I don't quite understand what you are trying to achive (your file at the top has 2 fields but in your script you seem to deal with 10 of them and I had too much beer to guess what you probably could mean - which is why I will focus purely on formal stuff :-)

    1) "perl -w" in the she-bang line and "use warnings" are pretty much the same thing (won't hurt of course :-)

    2) You should check the result of your open-statements e.g. "open my $fh, "file" or die $!;"

    3) Don't repeat yourself. Put the (common) directory that your files share into a variable, then you only have to change one line of code should you ever want to access files in a different directory.

    4) Why chomp the first line when you want to discard it anyway?

    5)The keys of your %clone_hash seem to be strings so you should simply sort them using cmp.

      1) "perl -w" in the she-bang line and "use warnings" are pretty much the same thing (won't hurt of course :-)
      Here is the difference, according to warnings:
      The warnings pragma is a replacement for the command line flag -w , but the pragma is limited to the enclosing block, while the flag is global. See perllexwarn for more information.
        Don't get your posting...

        I know there is difference that's why I said "pretty much the same" and not "exactly".

        For his case (as he does not seem to want to turn warnings off for a particular lexial scope) it won't make a difference.

        But ok:

        6) Don't use -w, always use warnings so you can turn them off again.

        Thanks Toolic!
      Hi Morgon!
      Thanks for the response and the cudo's on my code! Yes, the keys are strings and I understand that keys are unique. I want to process each unique string based on all of the values
      For example, the string 'Eif2b2' I want to count only the value 'pass2'. Remember
      that for each string I am only interested in counting values from 'pass1' to 'pass3'
      and then counting only those key value pairs where after I remove 'pass4' and
      greater there is only one key value pair that is less than 'pass4'.
      Thanks,
      LomSpace

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://757478]
Approved by linuxer
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (2)
As of 2024-04-19 19:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found