http://qs321.pair.com?node_id=1098646

Special_K has asked for the wisdom of the Perl Monks concerning the following question:

I have the following data structure:



$data_hash{$key0}[$idx0]{$key1} = <some floating point value>


I would like to sort this data structure by its values. Currently I am doing this by creating a new hash in which the key is the floating point value, and the value is an array where each element is a concatenation of $key0, $idx0, and $key1. Each value of the new hash has to be an array as opposed to a scalar because the floating point values in the original data structure are not guaranteed to be unique; that is two entries in the original hash of array of hashes could have the same value.

Is there a way to sort this data structure by value without having to create a new hash first? I figure there must be some way to use the sort command with the original data structure, but I can't seem to get it to work. Also if there is a way to sort my original data structure by value using a single sort command, are there any performance implications by doing so compared to the method I am currently using?

Here is the method I am currently using if anyone is curious:



# create a new hash where the key is the value of the original hash, a +nd the value is an array of keys from the original hash that have the + given value foreach $key0 (keys(%data_hash)) { foreach $key1 (keys(%{$data_hash{$key0}})) { for ($i = 0; $i < @{$limit_hash{$key0}{$key1}{'limit'}}; $i++) { push(@{$new_hash{$data_hash{$key0}[$i]{$key1}}}, $key0 . " +__" . $limit_hash{$key0}{$key1}{'limit'}[$i] . "__" . $key1); } } } # now print the new hash that is sorted by value foreach my $sorted_by_val ( sort{$a <=> $b} keys(%new_hash)) { printf("key %.2f: ", $sorted_by_val); foreach my $val (@{$new_hash{$sorted_by_val}}) { printf("$val"); } printf("\n"); }

Replies are listed 'Best First'.
Re: sorting hash of array of hashes by value
by AppleFritter (Vicar) on Aug 26, 2014 at 18:59 UTC

    A nested data structure like this cannot be sorted in the narrow sense of the word. You can sort a list, and you can sort an array by treating it as a list, but you cannot sort a hash. The best thing you can do is its keys, which naturally form a list (if the keys don't matter, you can also sort the values directly).

    my @sorted = sort @unsorted; say foreach @sorted; # Output in sorted order

    but not this:

    my %sorted = sort %unsorted # DOES NOT WORK! say foreach (sort values %unsorted) # workaround (if you don't care ab +out keys) say $sorted{$_} foreach (sort keys { $unsorted{$a} <=> $unsorted{$b} } + %unsorted) # workaround (general)

    Let's look at that workaround, though. The reason it works is that there's an easy way to get all the keys of a hash, namely keys (or, for the simpler case, all its values, using values). Is it possible to replicate this with a multilevel structure? Yes, but since you now have to take more than one level into account, it's not so simple anymore; there's no built-in function. The natural way of doing this (natural according to me, anyway) is flattening the keys, so that $data_hash{$key0}[$idx0]{$key1} becomes e.g. $flat_data_hash{"$key0-$idx0-$key1"}, and then working with the resulting structure as before. (If the individual keys matter, have the values of this structure be hashes that contain the value and all the individual keys.)

    That's pretty much what you've been doing, except you're also switching keys and values. I don't see a real reason for doing that, and it complicates your code (since values are not guaranteed to be unique, you have to maintain a list of keys for each), so I'd advise against that.

    I'll cook up some code to get you started; give me a few minutes. (There may be CPAN modules for this sort of thing, too; I haven't checked.)

    Unrelated observation: you don't use strict;, do you? I highly suggest doing so; it'll catch many mistakes. For best results, also use warnings;.

    Unrelated observation #2: your posted code doesn't actually work, does it? Your second foreach loop iterates over the keys of %{$data_hash{$key0}}, but %data_hash is supposed to contain (references to) arrays of hashes, not hashes.

      As promised, here's some skeleton code (and sample data) to get you started:

      #!/usr/bin/perl use strict; use warnings; use feature qw/say/; my %data_hash = ( "key0.1" => [ { "key1.1" => 3.3, "key1.2" => 17.8, "key1.3" => -2.4, }, { "key1.4" => 5.1, "key1.5" => 13, }, { "key1.6" => -69, "key1.7" => 127, "key1.8" => 2.718, "key1.9" => 3.3, }, ], "key0.2" => [ { "key1.10" => 3.3, "key1.11" => 2.5, }, { "key1.12" => -33, }, ], ); my %flat_hash = (); foreach my $key0 (keys %data_hash) { foreach my $index (0 .. $#{$data_hash{$key0}}) { foreach my $key1 (keys %{$data_hash{$key0}->[$index]}) { my $new_key = "$key0-$index-$key1"; $flat_hash{$new_key} = { "value" => $data_hash{$key0}->[$index]->{$key1}, "key0" => $key0, "index" => $index, "key1" => $key1, }; } } } foreach (sort { $flat_hash{$a}->{"value"} <=> $flat_hash{$b}->{"value" +} } keys %flat_hash) { say $flat_hash{$_}->{"value"}, ": key0 = ", $flat_hash{$_}->{"key0"}, ", index = ", $flat_hash{$_}->{"index"}, ", key1 = ", $flat_hash{$_}->{"key1"}; }

      I didn't integrate the %limit_hash part since it wasn't part of your question.

      N.B. - you can probably do this more easily and naturally still using map, but I just came back from an Apple Family reunion and am not yet thinking in Perl again. ;)

      That's pretty much what you've been doing, except you're also switching keys and values. I don't see a real reason for doing that, and it complicates your code (since values are not guaranteed to be unique, you have to maintain a list of keys for each), so I'd advise against that.



      I was told that by switching the keys and values, it will improve performance during the sort operation because if I sort the values, I have to do two hash lookups for every comparison operation performed by the sort function. If I sort the keys, then I don't.

      Therefore, if I really want to sort on the hash values, it would make sense to create a new temporary hash in which the values and keys are swapped, and then sort that new array on its keys (which are the values of the original hash), rather than the values. Is that incorrect?

      For example, wouldn't this:



      sort { $data_hash{$a} <=> $data_hash{$b} } keys(%data_hash)


      be slower than this:

      sort { $a <=> $b } keys(%reordered_data_hash)


      Where %reordered_data_hash is %data_hash with the values swapped with the keys?

        Therefore, if I really want to sort on the hash values, it would make sense to create a new temporary hash in which the values and keys are swapped, and then sort that new array on its keys (which are the values of the original hash), rather than the values. Is that incorrect?

        The answer, as usual, is "it depends". Of course extra hash lookups will slow things down, but by how much? Also (and this is where the "it depends" kicks in), in general you also have to factor in the time it'll take to construct a reverse hash.

        Here is a very simple test:

        #!/usr/bin/perl use strict; use warnings; use feature qw/say/; use Benchmark qw/cmpthese/; srand 0; our %hash = map { rand() } 1..1000; my $regular = sub { sort { $hash{$a} <=> $hash{$b} } keys %hash; }; my $keysonly = sub { sort { $a <=> $b } keys %hash; }; my $reverse_noref = sub { my %reverse_hash = (); foreach my $key (keys %hash) { $reverse_hash{$hash{$key}} = $key; } sort { $a <=> $b } keys %reverse_hash; }; my $reverse = sub { my %reverse_hash = (); foreach my $key (keys %hash) { push @{ $reverse_hash{$hash{$key}} }, $key; } sort { $a <=> $b } keys %reverse_hash; }; cmpthese(-2, { regular => $regular, keysonly => $keysonly, reverse => $reverse, reverse_noref => $reverse_noref });

        On the machine I'm currently on, this produces:

        $ perl 1099601.pl Rate reverse reverse_noref regular k +eysonly reverse 2222/s -- -35% -93% + -93% reverse_noref 3406/s 53% -- -89% + -89% regular 31693/s 1326% 830% -- + -0% keysonly 31726/s 1328% 831% 0% + -- $

        So in isolation, the difference between "regular" (with the hash lookup) and "keysonly" (without) is negligible (though of course the latter is ever so slightly faster), while constructing a reverse hash first is 13 times slower. Pushing to an array if/when you can't guarantee values are unique punishes you further, but even without that (reverse_noref) you're still an order of magnitude slower.

        What does that mean for you? If you construct a "reverse" hash as you go along, just like you'd construct a regular hash, there may not be much of a difference (or there may be; you'll have to check). If you already have a "regular" hash, just let sort do whatever it needs to do; the difference won't be as big if you only need to construct the reverse hash once and then access it many times, but it'll still be there.

        As always, it's better to measure than to assume when it comes to optimization. It may well be that your approach is actually faster for your script and data, and if speed is crucial, then using a less "natural" approach that's faster is entirely fair.

      Unrelated observation: you don't use strict;, do you? I highly suggest doing so; it'll catch many mistakes. For best results, also use warnings;.

      I do use strict and warnings; I just didn't copy them with that code fragment.

      Unrelated observation #2: your posted code doesn't actually work, does it? Your second foreach loop iterates over the keys of %{$data_hash{$key0}}, but %data_hash is supposed to contain (references to) arrays of hashes, not hashes.

      You are correct, it doesn't. In my attempt to genericize the code I accidentally introduced some typos.

Re: sorting hash of array of hashes by value
by LanX (Saint) on Aug 26, 2014 at 19:13 UTC
    Hashes can't be sorted b/c they have no order.

    But you can keep sorted arrays of keys or values.

    Your wish to keep the "path"s of a HoAoH sorted is a strong indication for me that you might wanna check the multi-dim hashes we inherited from Perl4.

    Like this you could flatten your data to a 1-dim hash

    $hash{$key0,$idx0,$key1}=<some floating point value>

    and keep an array of sorted @keys .

    Looks far simpler for me! (effectively it's using your concat approach w/o the overhead of the origanal HoAoH)

    If you need this more often you might wanna check on tiehash solutions on CPAN to allow sorted hashes by encapsulating the sorted @keys .

    HTH =)

    Cheers Rolf

    (addicted to the Perl Programming Language and ☆☆☆☆ :)

    edit

    fixed syntax error by 's/;/,/'

      Here a proof of concept: (stealing data from AppleFritter's example)

      #!/usr/bin/perl use strict; use warnings; use feature qw/say/; use Data::Dump qw/pp dd/; my %data = ( "key0.1" => [ { "key1.1" => 3.3, "key1.2" => 17.8, "key1.3" => -2.4, }, { "key1.4" => 5.1, "key1.5" => 13, }, { "key1.6" => -69, "key1.7" => 127, "key1.8" => 2.718, "key1.9" => 3.3, }, ], "key0.2" => [ { "key1.10" => 3.3, "key1.11" => 2.5, }, { "key1.12" => -33, }, ], ); my %flat = (); while (my ($k0,$v0) = each %data) { my $k1=-1; for my $v1 (@$v0) { $k1++; while (my ($k2,$v2) = each %$v1) { $flat{$k0,$k1,$k2} = $v2; } } } my @sorted_keys = sort { $flat{$a} <=> $flat{$b} } keys %flat; for my $k (@sorted_keys) { my $v = $flat{$k}; say "$v \t<= \t", join ", ", split ( $; , $k ); }

      Output:

      HTH! =)

      Cheers Rolf

      (addicted to the Perl Programming Language and ☆☆☆☆ :)

Re: sorting hash of array of hashes by value
by Solo (Deacon) on Aug 26, 2014 at 19:15 UTC