Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Sorting values of nested hash refs

by doran (Deacon)
on Feb 29, 2004 at 09:32 UTC ( [id://332618]=perlquestion: print w/replies, xml ) Need Help??

doran has asked for the wisdom of the Perl Monks concerning the following question:

I feel guilty asking, because this seems like something that would have been asked. But I couldn't find the solution I thought I was looking for. So I come to you.

If I have this bit of code:
while (my $d = $ks_sth->fetchrow_hashref()){ $KP->{$d->{k}}{$d->{v}}++; }
which produces a data structure similar to this:
$VAR1 = { '3094507' => { '2838413' => '22', '2976790' => '33', '539960' => '10', '2152725' => '19', '161546' => '5', '2197572' => '3' }, '2946464' => { '2952428' => '9', '3913125' => '10', '790014' => '2', '1065575' => '3' }, '2565888' => { '10538' => '1', '744' => '2' }, };
What's an efficient way to find the x (say, top 3) highest values in the "rightmost" numbers?

It's trivial if the structure is $d->{$foo} = $val, but I'm way less clear on how to sort on $val when I'm looking at the values of 'nested' hash refs, such as $d->{$foo}{$bar} = $val.
Any insights or pointers to online examples are appreciated.

Replies are listed 'Best First'.
Re: Sorting values of nested hash refs
by tinita (Parson) on Feb 29, 2004 at 12:56 UTC
    well, and here's a quick and dirty and inefficient approach, which, however, might suffice your needs if the hash is small enough:
    # gets the three biggest numbers my @values = (sort {$b <=> $a} map {values %$_} values %$hash)[0..2];
Re: Sorting values of nested hash refs
by neniro (Priest) on Feb 29, 2004 at 12:06 UTC
    Is this what you want?
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $hoh = { '3094507' => { '2838413' => '22', '2976790' => '33', '539960' => '10', '2152725' => '19', '161546' => '5', '2197572' => '3' }, '2946464' => { '2952428' => '9', '3913125' => '10', '790014' => '2', '1065575' => '3' }, '2565888' => { '10538' => '1', '744' => '2' }, }; my @results; foreach my $outer (keys %$hoh) { foreach my $inner ($hoh->{$outer}) { my $max = 0; for (values %$inner) { $max = $_ if ($_ > $max); } push @results, $max; } } print Dumper(@results);
    Addon: If you use List::Util you can make it a little bit easier:
    my $max = max (values %$inner);
    best regards, neniro
      neniro,
      I do not believe this accomplishes what doran was after. Instead of finding the 3 highest values of the inner hashes, you are finding the highest value in each inner hash. I think the way to go is to sort all the second level hash values into an array and pick the top 3.
      my @vals = sort {$b <=> $a } map { values %{$hoh->{$_}} } keys %$hoh; my ($one, $two, $three) = @vals[0..2];
      Cheers - L~R

      Update Completely missed that tinita had an almost identical solution below. In the case of a very large HoH, you can avoid sorting and gain a bit of efficiency with the following:

      my @top_3; my $filled = 0; VAL: for my $val ( map { values %{$hoh->{$_}} } keys %$hoh ) { if ( ! $filled ) { for ( 0 .. 2 ) { if ( ! defined $top_3[$_] ) { $top_3[$_] = $val; $filled = 1 if $_ == 2; next VAL; } } } if ( $val > $top_3[0] ) { ($top_3[0], $top_3[1]) = ($val, $top_3[0]); } elsif ( $val > $top_3[1] ) { ($top_3[1], $top_3[2]) = ($val, $top_3[1]); } elsif ( $val > $top_3[2] ) { $top_3[2] = $val; } } print "$_\n" for @top_3;
        Well (d'oh), an important thing I forgot to mention is that I also need to know where I am in the $HoH for any given value. So I not only need to find the top x (10 in my case) values of $HoH->{$foo}{$bar}, but I also need to keep track of the value of $foo and $bar for each of those x values.

        Anyway, your code gave me some idea and this is what I've come up with for now.
        while (my $kid_ = $kills_sth->fetchrow_hashref()){ $KP->{$kid_->{k}}{$kid_->{v}}++; } # How many do we keep? my $total = 10; my %top; for(keys %{$KP}){ # Player ID my $pid = $_; VAL: for ( keys %{$KP->{$pid}}) { # Target ID my $tid = $_; my $frags = $KP->{$pid}{$tid}; for(0 .. ($total - 1)){ my $idx = $_; $top{$idx}{f} ||= 0; if(($idx != ($total - 1)) && $frags > $top{$idx}{f}){ my $idx2 = $idx + 1; $top{$idx2}{k} = $top{$idx}{k}; $top{$idx2}{v} = $top{$idx}{v}; $top{$idx2}{f} = $top{$idx}{f}; $top{$idx}{k} = $pid; $top{$idx}{v} = $tid; $top{$idx}{f} = $frags; next VAL; } elsif(($idx == ($total - 1)) && ($frags > $top{$idx}{f})){ $top{$idx}{k} = $pid; $top{$idx}{v} = $tid; $top{$idx}{f} = $frags; next VAL; } } } } for(0 .. ($total - 1)){ my $id = $_; printf("%2s. p: %s\tt: %s\tf: %s\n",($id+1),$top{$id}{k},$top{$id} +{v},$top{$id}{f}); }
        Still, it seems awfully inefficient to me and I'm really leary of it's ninja scaling abilities. But that said, it works and produces the intended results:
        1. p: 2838413 t: 3440380 f: 328 2. p: 2838413 t: 1261188 f: 282 3. p: 539960 t: 2976790 f: 273 4. p: 3440380 t: 2838413 f: 268 5. p: 182560 t: 53957 f: 206 6. p: 539960 t: 53957 f: 196 7. p: 1261188 t: 53957 f: 190 8. p: 539960 t: 3440380 f: 171 9. p: 3440380 t: 539960 f: 146 10. p: 53957 t: 1261188 f: 127

        I'd appreciate any ideas about increasing efficiency.

        Thanks a bunch for your help.

        ps. Yes, it's a half-life stats program. :)
      Thanks, but as Limbic~Region says, this isn't quite what I'm looking for. It may end up being that a 'first level' hash ref may contain more than one of our final set of values.
Re: Sorting values of nested hash refs
by ambrus (Abbot) on Feb 29, 2004 at 19:48 UTC

    If the data you are processing is very large, and you want more than 3 highest values (eg 32 highest) then the most efficent solution is probably to use a binary heap of size 10. In the binary heap, you can delete the lowest value and add a new value together very fast. This is however not quite fast to implement, so most probably it just wont worth.

      In real life, the list may be very long. There may be thousands of first-level hash refs, and another order of magnitude of second level hash refs. At this point I'm looking for the top 10 results. I'm trying to optimize for speed of execution (this is a personal project and an little extra development time isn't such a problem).

      I'm not familiar with binary heaps. I'll look around.

      Thanks

        Update: This code has a bug, which can cause it to return one line of the result wrong. I've uploaded the correct code as a reply to this node. Diff the two codes to see where the bug was.

        There are several modules for heaps on CPAN. However, AFAIK, none of them supports popping an element and adding a new one in one step. I give you an example code doing that, so this may be faster than those modules (or it might not be, I don't know).

        This code excepts du output or the like as input, and prints the largeset files.

        This of course probably won't make any sense unless you know what binary heaps are.

        #!/usr/local/bin/perl use warnings; use strict; use less; our $infinity= 1e200; # $h= heap_init ($n): returns heap of $n elements # I use -1e200 as a very small number, probably smaller than any data +input. sub heap_init { [([-$infinity])x$_[0]]; } # heap_repl ($h, $e): remove smallest heap element and insert new elem +ent $e. # each element should be an array-ref, of which the first element is a # numerical key. sub heap_repl { my ($h, $e, $n, $p); ($h, $e)= @_; $n= 0; $p= $$h[0]; while (1) { if (2*$n+2 < @$h && $$h[2*$n+2][0] < $$e[0]) { if ($$h[2*$n+1][0] < $$h[2*$n+2][0]) { $$h[$n]= $$h[2*$n+1]; $n= 2*$n+1; } else { $$h[$n]= $$h[2*$n+2]; $n= 2*$n+2; } } elsif (2*$n+1 < @$h && $$h[2*$n+1][0] < $$e[0]) { $$h[$n]= $$h[2*$n+1]; $n= 2*$n+1; } else { last; } } $$h[$n]= $e; return $p; } # heap_pop ($h) removes the lowest element from a heap; returns undef +if the # heap is empty sub heap_pop { my ($h, $e); ($h)= @_; @$h<=1 and do { @$h==1 and return pop @$h; return; }; $e= pop @$h; return heap_repl $h, $e; } # main { my $heap= heap_init 32; my $elt; while (<>) { $_=~ /^(\d+)\s+(.*)$/ or die "error: line does not match"; heap_repl $heap, [$1, $2]; } while (defined ($elt= heap_pop $heap)) { print $$elt[0], "\t", $$elt[1], $/; } } __END__

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://332618]
Approved by AcidHawk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-04-24 22:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found