Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

how to access hash key from the hash value when hash key is pointing to an array of hash values?

by BhariD (Sexton)
on Jan 29, 2010 at 20:58 UTC ( [id://820397]=perlquestion: print w/replies, xml ) Need Help??

BhariD has asked for the wisdom of the Perl Monks concerning the following question:

I have a hash, where hash keys (taxon) point to an array of values (genes). For example:
my %gg = ('1567' => ['NP_011', 'NP_012', 'NP_013'], '2000'=> ['NP_020' +, 'NP_021', 'NP_024', 'NP_025', 'NP_035']);

I want to retrieve the taxon for a given gene. For example, when I search based on NP_012 gene, I want to get the taxon 1567 and so on. Is there a way I can do this?

Thanks!
  • Comment on how to access hash key from the hash value when hash key is pointing to an array of hash values?
  • Download Code

Replies are listed 'Best First'.
Re: how to access hash key from the hash value when hash key is pointing to an array of hash values?
by ikegami (Patriarch) on Jan 29, 2010 at 21:10 UTC

    You have to visit every value in order to determine if a certain value exists. We're talking about a pair of nested loops since your data structure has two levels.

    To find the first match:

    my $gene_to_find = 'NP_012'; my $matching_taxon; TAXON: for my $taxon (keys %gg) { for my $gene ( @{ $gg{$taxon} } ) { if ($gene eq $gene_to_find) { $matching_taxon = $taxon; last TAXON; } } } if (defined($matching_taxon)) { print("Gene $gene_to_find found in taxon $matching_taxon.\n"); } else { print("Gene $gene_to_find not found in any taxon.\n"); }

    To find the all matches:

    my $gene_to_find = 'NP_012'; my @matching_taxons; for my $taxon (keys %gg) { for my $gene ( @{ $gg{$taxon} } ) { if ($gene eq $gene_to_find) { push @matching_taxons, $taxon; } } } if (@matching_taxons) { print("Gene $gene_to_find found in taxons @matching_taxons.\n"); } else { print("Gene $gene_to_find not found in any taxon.\n"); }

    If you do many such searches between changes to %gg, you should build a hash of taxons by gene from your hash of genes by taxons.

      As always, ikegami provides a clear (and clearly explained), effective solution.

      Here's another approach that has the advantage of a bit more concision (I doubt it will actually run any faster), but the drawback that it uses some Perl idioms that may be unfamiliar to the programmer/maintainer. You can leave out the  defined test if you're absolutely sure a 'gene' will never be anything that looks false to Perl. (A similar problem, unaddressed, exists for the taxons.)

      >perl -wMstrict -le "use List::Util qw(first); my %gg = ( 1567 => [ qw(NP_011 NP_012 NP_013 NP_025) ], 2000 => [ qw(NP_020 NP_021 NP_024 NP_025 NP_035) ], 9999 => [ qw(NP_900 NP_910 NP_902 0) ], 9998 => [], ); SEARCH: for my $gene_to_find (qw(NP_011 NP_025 NP_999 0)) { print qq{searching for gene '$gene_to_find':}; my $matching_taxon = first { defined first { $_ eq $gene_to_find } @{ $gg{$_} } } keys %gg ; my @all_matching_taxons = grep { defined first { $_ eq $gene_to_find } @{ $gg{$_} } } keys %gg ; print qq{ NOT found} and next SEARCH if not $matching_taxon; print qq{ first found: '$matching_taxon'}; my $all_found = join q{' '}, @all_matching_taxons; print qq{ all found: '$all_found'}; } " searching for gene 'NP_011': first found: '1567' all found: '1567' searching for gene 'NP_025': first found: '2000' all found: '2000' '1567' searching for gene 'NP_999': NOT found searching for gene '0': first found: '9999' all found: '9999'

      See List::Util.

        One minor niggle, your results wording, "first found", implies some ordering in the hash which isn't there. Add some more data to the hash and you may well get a different "first found". If perhaps the taxon codes are all numeric and you said

        ... sort { $a <=> $b } keys %gg;

        then the labelling would be more meaningful.

        I hope this is of interest.

        Cheers,

        JohnGG

      BariD, these are the type of questions that I have and am very glad you asked them!
      ikegami, thanks for posting a very clear answer.
Re: how to access hash key from the hash value when hash key is pointing to an array of hash values?
by CountZero (Bishop) on Jan 29, 2010 at 21:42 UTC
    If you have a rather large taxon -> gene hash and you need to look up many genes, consider putting them in a database. SQLite seems very well suited for just such a thing. As a bonus you get your data to persist in a database, so you only have to do the transfer once and can do many look-ups without having to build-up your datastructure again and again.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://820397]
Approved by biohisham
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (1)
As of 2024-04-25 04:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found