Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Returning the lowest key in a hash (or highest)

by deprecated (Priest)
on Mar 25, 2001 at 03:05 UTC ( [id://66918]=perlquestion: print w/replies, xml ) Need Help??

deprecated has asked for the wisdom of the Perl Monks concerning the following question:

Hiya Monkies!

I have written two small functions (that are really the same thing just reworded) to return the lowest key and highest key in a hash. I get the feeling I am doing a lot of work (well, a moderate amount of work) to do something perl probably already does for me.

So without further ado, the code:

# assume a hash that looks like this: %hash = ( 1 => $hashref, 2 => $anotherhashref ); sub least_key { my %nums = %{ shift }; shift (@{ sort { $a <=> $b } keys (%nums) }); } sub high_key { my %nums = %{ shift }; shift (@{ sort { $b <=> $a } keys (%nums) }); }

Im aware that instead of switching $a and $b, i could have used pop instead of shift, but that isnt really relevant. I also thought this would be interesting to look at because its a perfect example of where perl's syntax gets a little murky.

psst, this isnt a round of perl golf either. :) I am just wondering if there is a better way to do this.

--
Laziness, Impatience, Hubris, and Generosity.

Replies are listed 'Best First'.
Re (tilly) 1: Returning the lowest key in a hash (or highest)
by tilly (Archbishop) on Mar 25, 2001 at 03:52 UTC
    Perl does not solve this problem particularly cleanly, but it can be arranged to.

    Since Perl is list-oriented, a design that allows you to think in terms of lists is likely to play well with the rest of how the language is organized. To this end List::Util offers handy functions like max and min, with which you can just type this:

    my $least = min(keys %hash); my $largest = max(keys %hash);
    This is nice because you solve your problem with hashes, but it also works for any other kind of list. And since Perl passes data by reference, this is fairly efficient.

    Now I suggested this in chatter, but unfortunately you had trouble compiling it. But these functions still are not hard to write in native Perl:

    sub max { my $max = shift; $max = ($max < $_) ? $_ : $max foreach @_; return $max; }
    and min is similar. Plus the corresponding string functions. (The fact that you need to have these pairs of operations - and would need more if you tried adding more basic data types - is something that has been bugging me about Perl lately. Ah well, no language is perfect.)

    For a generalization of the problem we also have reduce, which can be implemented in native Perl like this:

    sub reduce (&@) { my $op = shift; no strict 'refs'; my $pkg = caller; local *A = local *{"$pkg\::a"}; $A = shift; local *B = local *{"$pkg\::b"}; foreach $B (@_) { $A = $op->(); } $A; }
    (Yes, this uses prototypes. And globals.) Now with this single (icky) definition out of the way a lot of list utilities can be written fairly easily. Consider:
    sub max { reduce {$a > $b ? $a : $b } @_; }
    which as you can see is sort of like the flexibility of sort except for gathering things in a list together, not rearranging them.

    Another example to give a sense of what reduce is like:

    sub sum { reduce {$a + $b} @_; }

    UPDATE
    Cleaned up the implementation of reduce slightly.

Re: Returning the lowest key in a hash (or highest)
by MeowChow (Vicar) on Mar 25, 2001 at 05:36 UTC
    Here's a tied hash solution that caches min and max. It turns these lookups into an O(1) operation, but an O(n) penalty is paid on delete's of min and max keys. Of course, there's also a performance loss due to the tied interface, but we're CS people after all, and we don't concern ourselves with messy practicalities like that, right? =)

    Yes indeed, this required considerably more code than I had anticipated...

    update: Added tilly's optimization (doh!), so now delete's are overall an O(1) operation. Just don't go deleting the keys in sorted order :)
    update2: Added dkubb's optimization to _min and _max routines... good call!

    use strict; package MinMaxHash; sub HASH () { 0 }; sub MIN () { 1 }; sub MAX () { 2 }; sub new { my $class = shift; my $self = bless {}, $class; tie %$self, $class; $self; } sub minkey { my $self = shift; (tied %$self)->[MIN]; } sub maxkey { my $self = shift; (tied %$self)->[MAX]; } sub TIEHASH { bless [], shift; } sub STORE { my ($self, $key, $val) = @_; $self->[HASH]{$key} = $val; if (!defined $self->[MIN] or $key < $self->[MIN]) { $self->[MIN] = $key; } if (!defined $self->[MAX] or $key > $self->[MAX]) { $self->[MAX] = $key; } } sub FETCH { my ($self, $key) = @_; $self->[HASH]{$key}; } sub DELETE { my ($self, $key) = @_; delete $self->[HASH]{$key}; $self->[MIN] = _min(keys %{$self->[HASH]}) if $key == $self->[MIN]; $self->[MAX] = _max(keys %{$self->[HASH]}) if $key == $self->[MAX]; } sub CLEAR { my $self = shift; $self = []; } sub EXISTS { my ($self, $key) = @_; exists $self->[HASH]{$key}; } sub FIRSTKEY { my $self = shift; () = keys %{$self->[HASH]}; # reset iterator each %{$self->[HASH]}; } sub NEXTKEY { my $self = shift; each %{$self->[HASH]}; } # tilly^H^H^H^H^Hdkubbs's min & max subs... used only during DELETE's sub _max { my $max = shift; $max < $_ and $max = $_ for @_; $max; } sub _min { my $min = shift; $min > $_ and $min = $_ for @_; $min; } ### EXAMPLE ### package main; use Data::Dumper; my $h = new MinMaxHash; $h->{1} = 'foo'; $h->{2} = 'bar'; $h->{3} = 'baz'; $h->{10} = 'goop'; $h->{-1} = 'moo'; print "Minkey is ", $h->minkey, $/; print "Maxkey is ", $h->maxkey, $/; print Dumper $h;
       MeowChow                                   
                   s aamecha.s a..a\u$&owag.print
      Tied interfaces do tend to take more code than you would at first expect, don't they?

      But a slight optimization note. In your DELETE method there is no need to recompute MAX and MIN every time. Just put a check for whether the deleted element is MAX or MIN and then compute them accordingly. This makes DELETE efficient most of the time. Well algorithmically efficient, as you note Perl's tied interface could be snappier...

      I have one more optimization for you MeowChow. Inside your _min() and _max() subroutines, if the test returns false, it reassigns the original value back to itself. You can get around a 25% speed increase in the algorithm if you only assign the $_ value to $max or $min if it passes the test.

      Here's a short benchmark that illustrates the difference better:

      #!/usr/bin/perl -w use Algorithm::Numerical::Shuffle qw(shuffle); use Benchmark qw(cmpthese); #Randomize the array contents my @array = shuffle 0..1000; cmpthese(-3, { max_dkubb1 => sub { max_dkubb1(@array) }, max_tilly => sub { max_tilly(@array) }, }); sub max_dkubb1 { my $max = shift; $max < $_ and $max = $_ for @_; return $max; } sub max_tilly { my $max = shift; $max = $max < $_ ? $_ : $max for @_; return $max; }

      On my system the max_dkubb1() routine runs just over 25% faster than the other.

        A lot of that is 'and' being a lot faster than ?:, rather than the multiple assigns.

        Update:  Wrong!  jcwren challenged me to benchmark it and almost all the time is in the assigns.

        p
(jcwren) Re: Returning the lowest key in a hash (or highest)
by jcwren (Prior) on Mar 25, 2001 at 08:55 UTC

    Updated 2001/03/25 08:53:00 EST

    I added the jcwren3 and jcwren4 tests which shows what happens when you remove the overhead of passing the 1000 element array around. It sort of makes that part of the comparison unfair, but when you're talking about being 24 times faster by not passing the array...

    [jcw@linux jcw]$ perl q.pl Benchmark: running max_dkubb1, max_jcwren1, max_jcwren2, max_jcwren3, +max_jcwren4, max_tilly, each for at least 3 CPU seconds... max_jcwren1: 5 wallclock secs ( 2.95 usr + 0.05 sys = 3.00 CPU) @ 26 +010.00/s (n=78030) max_jcwren2: 6 wallclock secs ( 3.23 usr + 0.00 sys = 3.23 CPU) @ 90 +57.59/s (n=29256) max_jcwren3: 5 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 53 +767.33/s (n=161302) max_jcwren4: 5 wallclock secs ( 3.01 usr + 0.00 sys = 3.01 CPU) @ 11 +827.24/s (n=35600) max_dkubb1: 5 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 21 +65.67/s (n=6497) max_tilly: 5 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 17 +11.33/s (n=5134) [jcw@linux jcw]$

    If doing in it Perl is fast, doing it in 'C' must be faster, right? Below is a comparison of two Inline methods. maxarray1 expects the list passed into it, whereas maxarray2 expects a reference. To my surprise, the first method is faster. I imagine that it's because the array doesn't have to be dereferenced for each element access. But that's for guys like tye, tilly, and chipmunk to tell me. Me, I don't know jack about Perl internals.

    This *is* an example of knowing your data. Floating point values will fail miserably, since it expects a list of integers. But hey, if it's something you had to do a lot, you should optimize your routines for the expected data type. Flexibility is a Good Thing(tm), but it's expendable when you're looking for serious performance gains.

    #!/usr/local/bin/perl -w use Algorithm::Numerical::Shuffle qw(shuffle); use Benchmark qw(timethese); #Randomize the array contents my @array = shuffle 0..1000; timethese(-3, { max_jcwren1 => sub { max_jcwren1(@array) }, max_jcwren2 => sub { max_jcwren2(@array) }, max_jcwren3 => sub { maxarray1(@array) }, max_jcwren4 => sub { maxarray2(\@array) }, max_dkubb1 => sub { max_dkubb1(@array) }, max_tilly => sub { max_tilly(@array) }, }); use Inline C => <<'END_OF_C_CODE'; int maxarray1(SV* array1, ...) { Inline_Stack_Vars; int max = 0; int i, j; for (i = 0; i < Inline_Stack_Items; i++) if ((j = (SvIV(Inline_Stack_Item(i)))) > max) max = j; return (max); } int maxarray2(SV* array_ref) { AV* array; int num_items; int max = 0; int i, j; if (! SvROK(array_ref)) croak("array_ref is not a reference"); array = (AV*)SvRV(array_ref); num_items = av_len(array); for (i = 0; i <= num_items; i++) if ((j = (SvIV(*av_fetch (array, i, 0)))) > max) max = j; return (max); } END_OF_C_CODE sub max_jcwren1 { return maxarray1 (@_); } sub max_jcwren2 { return maxarray2 (\@_); } sub max_dkubb1 { my $max = shift; $max < $_ and $max = $_ for @_; return $max; } sub max_tilly { my $max = shift; $max = $max < $_ ? $_ : $max for @_; return $max; }
    --Chris

    e-mail jcwren
Re: Returning the lowest key in a hash (or highest)
by MrNobo1024 (Hermit) on Mar 25, 2001 at 03:27 UTC
    It's probably faster to iterate instead of sorting:
    sub lowest_key (\%) { my @keys = keys %{ shift() }; my $low = $keys[0]; foreach(@keys) { $low = $_ if $_ < $low; } return $low; }
Re: Returning the lowest key in a hash (or highest)
by indigo (Scribe) on Mar 25, 2001 at 03:31 UTC
    Sort takes O(n log n). Simply walking the keys of the hash is only O(n).
    my ($min, undef) = each %hash; $min > $_ and $min = $_ for keys %hash;
    Updated: Fixed that bug...:)
      Says indigo:
      Sort takes O(n log n). Simply walking the keys of the hash is only O(n).
      A lot of people made this point, but I wonder if it's really as important as people think. sort() is written in C and is very fast. Walking the keys requires that a lot of Perl opcodes must be dispatched, so it is probably a lot slower. Yes, the times are O(n log n) and O(n), but that ignores the constant factors, and I would expect the constant factor for the sort to be much smaller, so that the sort would be faster except for extremely large hashes.

      Maybe someone who likes benchmarking can look into this and post the results.

        Here's a benchmark I tried that compares sorting to hash-walking for hashes of various sizes, and the results confirm Dominus's expectations. The hash entries are generated using rand(65000) and both subs are given the same hash to work with. (I think the comparison is pretty fair, let me know if you have ideas for either of the subs!)

        The code:

        sub walkit { my $hashref = shift; my $min = each %$hashref; $min > $_ and $min = $_ for keys %$hashref; $min; } sub sortit { my $hashref = shift; ( sort {$a<=>$b} keys %$hashref )[0]; }

        The results:

        Size       sort       walk
        10      8329.50/s  5289.68/s  (sort wins!)
        100     1020.53/s   727.29/s  (sort wins!)
        1000      94.68/s    80.14/s  (sort wins!)
        10000      7.15/s     7.31/s  (almost a wash...)
        100000     0.52/s     0.73/s  (walking wins!)
        

      Except for the fact that since you initialized $min to 0, it will stay at 0 even if there's no 0 in the hash!
Re: Returning the lowest key in a hash (or highest)
by MeowChow (Vicar) on Mar 26, 2001 at 03:29 UTC
    Optimizing this function via Inline.pm really interested me, so I did a few more benchmarks. What I found quite unexpected was that sort actually came out ahead of the Inline-C version for very small arrays. Is there substantial overhead in XS that isn't present in built-ins?

    update: The c_minmax Inline sub had a silly inefficiency, which was fixed. Now sort is slower, by about 5-10%. Benchmarks have been updated to reflect this. Hurray?

    # keywords: bench hash clobber override use strict; use warnings; use Benchmark qw(cmpthese); use Data::Dumper; $| = 1; sub shuffle { my $array = shift; for (my $i = @$array; --$i; ) { my $j = int rand ($i+1); @$array[$i,$j] = @$array[$j,$i]; } } sub iter_minmax { my $min = my $max = shift; for (@_) { if ($max < $_) { $max = $_ } elsif ($min > $_) { $min = $_ } } return ($min, $max); } sub sort_minmax { (sort { $a <=> $b } @_)[0,-1]; } use Inline C => <<'END_OF_C_CODE'; void c_minmax(SV* array1, ...) { Inline_Stack_Vars; int max = SvIV(Inline_Stack_Item(0)); int min = max; int i, val; for (i = 1; i < Inline_Stack_Items; i++) { val = SvIV(Inline_Stack_Item(i)); if (val > max) max = val; else if (val < min) min = val; } Inline_Stack_Reset; Inline_Stack_Push(sv_2mortal(newSViv(min))); Inline_Stack_Push(sv_2mortal(newSViv(max))); Inline_Stack_Done; } END_OF_C_CODE for my $size (10,100,1000,10000) { my %hash; my @array = 1..$size; shuffle(\@array); print "\n\n--- A hash of size $size ---\n"; cmpthese (-3, { c => sub { c_minmax(@array) }, iter => sub { iter_minmax(@array) }, sort => sub { sort_minmax(@array) }, }); } ### RESULTS ### --- Array of size 10 --- Rate iter sort c iter 81433/s -- -69% -72% sort 260421/s 220% -- -10% c 288754/s 255% 11% -- --- Array of size 100 --- Rate iter sort c iter 12174/s -- -44% -85% sort 21817/s 79% -- -73% c 81029/s 566% 271% -- --- Array of size 1000 --- Rate iter sort c iter 1298/s -- -10% -87% sort 1448/s 12% -- -85% c 9976/s 669% 589% -- --- Array of size 10000 --- Rate sort iter c sort 59.3/s -- -45% -94% iter 108/s 82% -- -89% c 953/s 1509% 782% --
Re: Returning the lowest key in a hash (or highest)
by Masem (Monsignor) on Mar 25, 2001 at 03:32 UTC
    Since you only need to go through the list once instead of a full sorted list (so that time is in O(n) rather than O(n log n)) just do something like:
    my $l; foreach $i (keys(%hash)) { $l = $l < $i ? $l : $i; } $hash{ $l } = ...

    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://66918]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-04-20 00:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found