http://qs321.pair.com?node_id=559809

0xbeef has asked for the wisdom of the Perl Monks concerning the following question:

I am struggling to find a solution in sorting a reasonably complex hash-of-hashes directly if the sort key is located a number of levels below the first keys in the hash.

I bypass this problem (inefficiently) by deriving a second simplified hash through which I can then sort by time.

Hopefully this simplified example will illustrate it more clearly:

#!/usr/bin/perl -w use strict; my %ERROR; my %ERROR2; my @classes = ('H','S','O'); my $output = <<EOD; 4865FA9B 0702 P H rmt0 TAPE OPERATION ERROR DE9A52D1 0704 I S rmt0 DEVICE DUMP RETRIEVED 4865FA9B 0701 P H rmt2 TAPE OPERATION ERROR F3E9B3E2 0620 I O SYSJ2 UNABLE TO ALLOCATE SPACE IN FILE SY +STEM DCB47997 0511 T H hdisk4 DISK OPERATION ERROR EOD # populate original hash for my $line (split /\n/,$output) { chomp $line; my ($IDENTIFIER,$TIMESTAMP,$T,$C,$RES,$DESC) = split(/\s+/,$line,6) +; $ERROR{$C}{$RES}{$DESC}{time} = $TIMESTAMP; $ERROR{$C}{$RES}{$DESC}{id} = $IDENTIFIER; $ERROR{$C}{$RES}{$DESC}{T} = $T; } # elsewhere, I create a secondary hash for sorting. my $counter = 0; for my $CLASS (@classes) { for my $RES (keys %{$ERROR{$CLASS}}) { for my $DESC (keys %{$ERROR{$CLASS}{$RES}}) { $counter++; $ERROR2{$CLASS}{$counter}{time} = $ERROR{$CLASS}{$RES}{$DESC} +{time}; $ERROR2{$CLASS}{$counter}{desc} = $DESC; # ... transfer some other data from %ERROR to %ERROR2 } } } # ... # output - SORT EACH CLASS BY MOST RECENT TIME. # sort using CLASS as primary and time as secondary key. # note how RES and DESC have no influence on the sorting. for my $CLASS (@classes) { for my $nr (reverse sort {$ERROR2{$CLASS}{$a}{time} <=> $ERROR2{$CLA +SS}{$b}{time} } keys %{$ERROR2{$CLASS}}) { print "CLASS $CLASS : Time : $ERROR2{$CLASS}{$nr}{time} Descript +ion: $ERROR2{$CLASS}{$nr}{desc}\n"; } }

I know a possibility would be to reconstruct the original hash in the %ERROR2 format - but this is not feasible due to other issues, so I'm stuck with two hashes to avoid the effect that RES and DESC keys would have when traversing %ERROR in a for loop.

Is there a method which would avoid the need for %ERROR2, and still provide output of "each class sorted by time"?

-Niel

Replies are listed 'Best First'.
Re: Hash keys affect sorting
by TedPride (Priest) on Jul 07, 2006 at 15:57 UTC
    I must be missing something here, but why use a hash structure at all? Just put your data into an array, then if you want to be able to access the array based on certain keys, create a hash indexes for just those keys:
    use strict; use warnings; my (@data, %identifier); while (<DATA>) { chomp; @_ = split /\s+/, $_, 6; push @data, [@_]; $identifier{$_[0]} = $data[-1]; # Index for first field } # Sort on multiple fields, in this case fifth and first... print "@$_\n" for sort {$a->[4] cmp $b->[4] || $a->[0] <=> $b->[0]} @d +ata; __DATA__ 1 1023 T C cc Item 1 2 1560 T C aa Item 2 3 9102 T C bb Item 3 4 11222 T C ff Item 4 7 13456 T C bb Item 7
      I'm a baby monk only... ;)

      Have to admit I have never seen @_ used for anything else than for function arguments, and I don't exactly understand push @data, [@_]; Using @_ seems nice and intuitive, applying it to my example gives me:

      use strict; my $output = <<EOD; 4865FA9B 0702 P H rmt0 TAPE OPERATION ERROR DE9A52D1 0704 I S rmt0 DEVICE DUMP RETRIEVED 4865FA9B 0701 P H rmt2 TAPE OPERATION ERROR F3E9B3E2 0620 I O SYSJ2 UNABLE TO ALLOCATE SPACE IN FILE SY +STEM DCB47997 0511 T H hdisk4 DISK OPERATION ERROR EOD my (@data, %identifier); for my $line (split /\n/,$output) { @_ = split /\s+/, $line, 6; push @data, [@_]; $identifier{$_[0]} = $data[-1]; # Index for first field } # Sort on multiple fields, in this case 4th (class) and 2nd (time)... print "\@data:\n"; for my $rec (reverse sort {$a->[3] cmp $b->[3] || $a->[1] <=> $b->[1]} + @data) { print "Error ID: $rec->[0] Time: $rec->[1]\n"; }

      Also, if i understand you right, %identifier contains a hash of array references which I can use to access @data elements e.g.

      print "key $_ value: $identifier{$_}->[0]\n" for (keys %identifier);

      Thanks for the help!

      Niel

        $_ is to scalar what @_ is to array and %_ is to hash, although %_ is seldom seen in the wild.

        push @data, [@_] means: $_ was split into @_; if they said push @data, @_ the array @_ would append it's elements to @data. To append @_ as one unit to @data it must be given boundaries [ ]; this construct creates an array reference. This reference (a scalar) is stored in @data, holding the elements of @_ which are copied into [ ] at the moment of creation of [ ]. See perlreftut. Think of [@_] as "scalarifying" @_ into an (anonymous) array reference; an array can only hold scalars (single values, which references happen to be).

        Wow. What noise ;-)

        hoping not to have confused you further,
        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Hash keys affect sorting
by johngg (Canon) on Jul 07, 2006 at 16:22 UTC
    Just a quick observation. Your split /\n/,$output will have the effect of removing the newlines from each line as the split delimiter is not included in the resulting list. Thus the following chomp $line; is superfluous.

    Cheers,

    JohnGG

      Well spotted - it's an oversight that happened when I simplified the actual code into an example...

      Niel