Returning the lowest key in a hash (or highest)

deprecated has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re (tilly) 1: Returning the lowest key in a hash (or highest) by tilly (Archbishop) on Mar 25, 2001 at 03:52 UTC
Perl does not solve this problem particularly cleanly, but it can be arranged to. Since Perl is list-oriented, a design that allows you to think in terms of lists is likely to play well with the rest of how the language is organized. To this end List::Util offers handy functions like max and min, with which you can just type this: `my $least = min(keys %hash); my $largest = max(keys %hash);` [download] This is nice because you solve your problem with hashes, but it also works for any other kind of list. And since Perl passes data by reference, this is fairly efficient. Now I suggested this in chatter, but unfortunately you had trouble compiling it. But these functions still are not hard to write in native Perl: `sub max { my $max = shift; $max = ($max < $_) ? $_ : $max foreach @_; return $max; }` [download] and min is similar. Plus the corresponding string functions. (The fact that you need to have these pairs of operations - and would need more if you tried adding more basic data types - is something that has been bugging me about Perl lately. Ah well, no language is perfect.) For a generalization of the problem we also have reduce, which can be implemented in native Perl like this: `sub reduce (&@) { my $op = shift; no strict 'refs'; my $pkg = caller; local A = local {"$pkg\::a"}; $A = shift; local B = local {"$pkg\::b"}; foreach $B (@_) { $A = $op->(); } $A; }` [download] (Yes, this uses prototypes. And globals.) Now with this single (icky) definition out of the way a lot of list utilities can be written fairly easily. Consider: `sub max { reduce {$a > $b ? $a : $b } @_; }` [download] which as you can see is sort of like the flexibility of sort except for gathering things in a list together, not rearranging them. Another example to give a sense of what reduce is like: `sub sum { reduce {$a + $b} @_; }` [download] UPDATE Cleaned up the implementation of reduce slightly.	[reply] [d/l] [select]
Re: Returning the lowest key in a hash (or highest) by MeowChow (Vicar) on Mar 25, 2001 at 05:36 UTC
Here's a tied hash solution that caches min and max. It turns these lookups into an O(1) operation, but an O(n) penalty is paid on delete's of min and max keys. Of course, there's also a performance loss due to the tied interface, but we're CS people after all, and we don't concern ourselves with messy practicalities like that, right? =) Yes indeed, this required considerably more code than I had anticipated... update: Added tilly's optimization (doh!), so now delete's are overall an O(1) operation. Just don't go deleting the keys in sorted order :) update2: Added dkubb's optimization to _min and _max routines... good call! use strict; package MinMaxHash; sub HASH () { 0 }; sub MIN () { 1 }; sub MAX () { 2 }; sub new { my $class = shift; my $self = bless {}, $class; tie %$self, $class; $self; } sub minkey { my $self = shift; (tied %$self)->[MIN]; } sub maxkey { my $self = shift; (tied %$self)->[MAX]; } sub TIEHASH { bless [], shift; } sub STORE { my ($self, $key, $val) = @_; $self->[HASH]{$key} = $val; if (!defined $self->[MIN] or $key < $self->[MIN]) { $self->[MIN] = $key; } if (!defined $self->[MAX] or $key > $self->[MAX]) { $self->[MAX] = $key; } } sub FETCH { my ($self, $key) = @_; $self->[HASH]{$key}; } sub DELETE { my ($self, $key) = @_; delete $self->[HASH]{$key}; $self->[MIN] = _min(keys %{$self->[HASH]}) if $key == $self->[MIN]; $self->[MAX] = _max(keys %{$self->[HASH]}) if $key == $self->[MAX]; } sub CLEAR { my $self = shift; $self = []; } sub EXISTS { my ($self, $key) = @_; exists $self->[HASH]{$key}; } sub FIRSTKEY { my $self = shift; () = keys %{$self->[HASH]}; # reset iterator each %{$self->[HASH]}; } sub NEXTKEY { my $self = shift; each %{$self->[HASH]}; } # tilly^H^H^H^H^Hdkubbs's min & max subs... used only during DELETE's sub _max { my $max = shift; $max < $_ and $max = $_ for @_; $max; } sub _min { my $min = shift; $min > $_ and $min = $_ for @_; $min; } ### EXAMPLE ### package main; use Data::Dumper; my $h = new MinMaxHash; $h->{1} = 'foo'; $h->{2} = 'bar'; $h->{3} = 'baz'; $h->{10} = 'goop'; $h->{-1} = 'moo'; print "Minkey is ", $h->minkey, $/; print "Maxkey is ", $h->maxkey, $/; print Dumper $h; [download] MeowChow s aamecha.s a..a\u$&owag.print	[reply] [d/l]
Re (tilly) 2: Returning the lowest key in a hash (or highest) by tilly (Archbishop) on Mar 25, 2001 at 05:55 UTC
Tied interfaces do tend to take more code than you would at first expect, don't they? But a slight optimization note. In your DELETE method there is no need to recompute MAX and MIN every time. Just put a check for whether the deleted element is MAX or MIN and then compute them accordingly. This makes DELETE efficient most of the time. Well algorithmically efficient, as you note Perl's tied interface could be snappier...	[reply]
(dkubb) Re: (3) Returning the lowest key in a hash (or highest) by dkubb (Deacon) on Mar 25, 2001 at 07:22 UTC
I have one more optimization for you MeowChow. Inside your `_min()` and `_max()` subroutines, if the test returns false, it reassigns the original value back to itself. You can get around a 25% speed increase in the algorithm if you only assign the `$_` value to `$max` or `$min` if it passes the test. Here's a short benchmark that illustrates the difference better: `#!/usr/bin/perl -w use Algorithm::Numerical::Shuffle qw(shuffle); use Benchmark qw(cmpthese); #Randomize the array contents my @array = shuffle 0..1000; cmpthese(-3, { max_dkubb1 => sub { max_dkubb1(@array) }, max_tilly => sub { max_tilly(@array) }, }); sub max_dkubb1 { my $max = shift; $max < $_ and $max = $_ for @_; return $max; } sub max_tilly { my $max = shift; $max = $max < $_ ? $_ : $max for @_; return $max; }` [download] On my system the `max_dkubb1()` routine runs just over 25% faster than the other.	[reply] [d/l] [select]
Re: (dkubb) Re: (3) Returning the lowest key in a hash (or highest) by petral (Curate) on Mar 27, 2001 at 06:46 UTC
A lot of that is 'and' being a lot faster than ?:, rather than the multiple assigns. Update: Wrong! jcwren challenged me to benchmark it and almost all the time is in the assigns. p	[reply]
Re: Re: (dkubb) Re: (3) Returning the lowest key in a hash (or highest) by and (Pilgrim) on Mar 27, 2001 at 20:12 UTC
(jcwren) Re: Returning the lowest key in a hash (or highest) by jcwren (Prior) on Mar 25, 2001 at 08:55 UTC
Updated 2001/03/25 08:53:00 EST I added the jcwren3 and jcwren4 tests which shows what happens when you remove the overhead of passing the 1000 element array around. It sort of makes that part of the comparison unfair, but when you're talking about being 24 times faster by not passing the array... [jcw@linux jcw]$ perl q.pl Benchmark: running max_dkubb1, max_jcwren1, max_jcwren2, max_jcwren3, +max_jcwren4, max_tilly, each for at least 3 CPU seconds... max_jcwren1: 5 wallclock secs ( 2.95 usr + 0.05 sys = 3.00 CPU) @ 26 +010.00/s (n=78030) max_jcwren2: 6 wallclock secs ( 3.23 usr + 0.00 sys = 3.23 CPU) @ 90 +57.59/s (n=29256) max_jcwren3: 5 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 53 +767.33/s (n=161302) max_jcwren4: 5 wallclock secs ( 3.01 usr + 0.00 sys = 3.01 CPU) @ 11 +827.24/s (n=35600) max_dkubb1: 5 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 21 +65.67/s (n=6497) max_tilly: 5 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 17 +11.33/s (n=5134) [jcw@linux jcw]$ [download] If doing in it Perl is fast, doing it in 'C' must be faster, right? Below is a comparison of two Inline methods. maxarray1 expects the list passed into it, whereas maxarray2 expects a reference. To my surprise, the first method is faster. I imagine that it's because the array doesn't have to be dereferenced for each element access. But that's for guys like tye, tilly, and chipmunk to tell me. Me, I don't know jack about Perl internals. This is an example of knowing your data. Floating point values will fail miserably, since it expects a list of integers. But hey, if it's something you had to do a lot, you should optimize your routines for the expected data type. Flexibility is a Good Thing(tm), but it's expendable when you're looking for serious performance gains. #!/usr/local/bin/perl -w use Algorithm::Numerical::Shuffle qw(shuffle); use Benchmark qw(timethese); #Randomize the array contents my @array = shuffle 0..1000; timethese(-3, { max_jcwren1 => sub { max_jcwren1(@array) }, max_jcwren2 => sub { max_jcwren2(@array) }, max_jcwren3 => sub { maxarray1(@array) }, max_jcwren4 => sub { maxarray2(\@array) }, max_dkubb1 => sub { max_dkubb1(@array) }, max_tilly => sub { max_tilly(@array) }, }); use Inline C => <<'END_OF_C_CODE'; int maxarray1(SV* array1, ...) { Inline_Stack_Vars; int max = 0; int i, j; for (i = 0; i < Inline_Stack_Items; i++) if ((j = (SvIV(Inline_Stack_Item(i)))) > max) max = j; return (max); } int maxarray2(SV* array_ref) { AV* array; int num_items; int max = 0; int i, j; if (! SvROK(array_ref)) croak("array_ref is not a reference"); array = (AV)SvRV(array_ref); num_items = av_len(array); for (i = 0; i <= num_items; i++) if ((j = (SvIV(av_fetch (array, i, 0)))) > max) max = j; return (max); } END_OF_C_CODE sub max_jcwren1 { return maxarray1 (@_); } sub max_jcwren2 { return maxarray2 (\@_); } sub max_dkubb1 { my $max = shift; $max < $_ and $max = $_ for @_; return $max; } sub max_tilly { my $max = shift; $max = $max < $_ ? $_ : $max for @_; return $max; } [download] --Chris e-mail jcwren	[reply] [d/l] [select]
Re: Returning the lowest key in a hash (or highest) by MrNobo1024 (Hermit) on Mar 25, 2001 at 03:27 UTC
It's probably faster to iterate instead of sorting: `sub lowest_key (\%) { my @keys = keys %{ shift() }; my $low = $keys[0]; foreach(@keys) { $low = $_ if $_ < $low; } return $low; }` [download]	[reply] [d/l]
Re: Returning the lowest key in a hash (or highest) by indigo (Scribe) on Mar 25, 2001 at 03:31 UTC
Sort takes O(n log n). Simply walking the keys of the hash is only O(n). `my ($min, undef) = each %hash; $min > $_ and $min = $_ for keys %hash;` [download] Updated: Fixed that bug...:)	[reply] [d/l]
Re: Returning the lowest key in a hash (or highest) by Dominus (Parson) on Mar 25, 2001 at 20:11 UTC
Says indigo: Sort takes O(n log n). Simply walking the keys of the hash is only O(n). A lot of people made this point, but I wonder if it's really as important as people think. `sort()` is written in C and is very fast. Walking the keys requires that a lot of Perl opcodes must be dispatched, so it is probably a lot slower. Yes, the times are O(n log n) and O(n), but that ignores the constant factors, and I would expect the constant factor for the sort to be much smaller, so that the sort would be faster except for extremely large hashes. Maybe someone who likes benchmarking can look into this and post the results.	[reply]
Benchmarking hash sort vs. walk (Was: Re: Re: Returning the lowest key in a hash (or highest)) by larryl (Monk) on Mar 26, 2001 at 01:03 UTC
Here's a benchmark I tried that compares sorting to hash-walking for hashes of various sizes, and the results confirm Dominus's expectations. The hash entries are generated using `rand(65000)` and both subs are given the same hash to work with. (I think the comparison is pretty fair, let me know if you have ideas for either of the subs!) The code: `sub walkit { my $hashref = shift; my $min = each %$hashref; $min > $_ and $min = $_ for keys %$hashref; $min; } sub sortit { my $hashref = shift; ( sort {$a<=>$b} keys %$hashref )[0]; }` [download] The results: Size sort walk 10 8329.50/s 5289.68/s (sort wins!) 100 1020.53/s 727.29/s (sort wins!) 1000 94.68/s 80.14/s (sort wins!) 10000 7.15/s 7.31/s (almost a wash...) 100000 0.52/s 0.73/s (walking wins!)	[reply] [d/l] [select]
Re: Re: Returning the lowest key in a hash (or highest) by Anonymous Monk on Mar 25, 2001 at 03:35 UTC
Except for the fact that since you initialized $min to 0, it will stay at 0 even if there's no 0 in the hash!	[reply]
Re: Returning the lowest key in a hash (or highest) by MeowChow (Vicar) on Mar 26, 2001 at 03:29 UTC
Optimizing this function via Inline.pm really interested me, so I did a few more benchmarks. What I found quite unexpected was that sort actually came out ahead of the Inline-C version for very small arrays. Is there substantial overhead in XS that isn't present in built-ins? update: The c_minmax Inline sub had a silly inefficiency, which was fixed. Now sort is slower, by about 5-10%. Benchmarks have been updated to reflect this. Hurray? # keywords: bench hash clobber override use strict; use warnings; use Benchmark qw(cmpthese); use Data::Dumper; $\| = 1; sub shuffle { my $array = shift; for (my $i = @$array; --$i; ) { my $j = int rand ($i+1); @$array[$i,$j] = @$array[$j,$i]; } } sub iter_minmax { my $min = my $max = shift; for (@_) { if ($max < $_) { $max = $_ } elsif ($min > $_) { $min = $_ } } return ($min, $max); } sub sort_minmax { (sort { $a <=> $b } @_)[0,-1]; } use Inline C => <<'END_OF_C_CODE'; void c_minmax(SV* array1, ...) { Inline_Stack_Vars; int max = SvIV(Inline_Stack_Item(0)); int min = max; int i, val; for (i = 1; i < Inline_Stack_Items; i++) { val = SvIV(Inline_Stack_Item(i)); if (val > max) max = val; else if (val < min) min = val; } Inline_Stack_Reset; Inline_Stack_Push(sv_2mortal(newSViv(min))); Inline_Stack_Push(sv_2mortal(newSViv(max))); Inline_Stack_Done; } END_OF_C_CODE for my $size (10,100,1000,10000) { my %hash; my @array = 1..$size; shuffle(\@array); print "\n\n--- A hash of size $size ---\n"; cmpthese (-3, { c => sub { c_minmax(@array) }, iter => sub { iter_minmax(@array) }, sort => sub { sort_minmax(@array) }, }); } ### RESULTS ### --- Array of size 10 --- Rate iter sort c iter 81433/s -- -69% -72% sort 260421/s 220% -- -10% c 288754/s 255% 11% -- --- Array of size 100 --- Rate iter sort c iter 12174/s -- -44% -85% sort 21817/s 79% -- -73% c 81029/s 566% 271% -- --- Array of size 1000 --- Rate iter sort c iter 1298/s -- -10% -87% sort 1448/s 12% -- -85% c 9976/s 669% 589% -- --- Array of size 10000 --- Rate sort iter c sort 59.3/s -- -45% -94% iter 108/s 82% -- -89% c 953/s 1509% 782% -- [download]	[reply] [d/l]
Re: Returning the lowest key in a hash (or highest) by Masem (Monsignor) on Mar 25, 2001 at 03:32 UTC
Since you only need to go through the list once instead of a full sorted list (so that time is in O(n) rather than O(n log n)) just do something like: `my $l; foreach $i (keys(%hash)) { $l = $l < $i ? $l : $i; } $hash{ $l } = ...` [download] Dr. Michael K. Neylon - mneylon-pm@masemware.com \|\| "You've left the lens cap of your mind on again, Pinky" - The Brain	[reply] [d/l]


Just another Perl shrine
	PerlMonks