LucaPette has asked for the wisdom of the Perl Monks concerning the following question:
I have written the following subroutine to find minimum, maximum and average at the same time:
sub min_max_avg {
my $ref = shift;
my ( $min, $max, $agv, $i );
my ( $current_min, $current_max );
if ( @{$ref} % 2 == 0 ) {
( $min, $max ) =
$ref->[0] < $ref->[1]
? ( $ref->[0], $ref->[1] )
: ( $ref->[1], $ref->[0] );
$agv = $ref->[0] + $ref->[1];
$i = 2;
}
else {
$min = $max = $agv = $ref->[0];
$i = 1;
}
while ( $i < @{$ref} ) {
( $current_min, $current_max ) =
$ref->[$i] < $ref->[ $i + 1 ]
? ( $ref->[$i], $ref->[ $i + 1 ] )
: ( $ref->[ $i + 1 ], $ref->[$i] );
$min = $current_min if ( $current_min < $min );
$max = $current_max if ( $current_max > $max );
$agv += $ref->[$i] + $ref->[ $i + 1 ];
$i += 2;
}
return ( $min, $max, $agv / @{$ref} );
}
Note that:
- From well known "Introduction to algorithms, 2nd Edition" (ISBN 0070131511), page 162:
...at most 3(n/2) comparisons are sufficient to find both the minimum and the
maximum.
The strategy is to maintain the minimum and maximum elements seen thus far. Rather than processing each element of the input by comparing it against the current
minimum and maximum, at a cost of 2 comparisons per element, we process elements in
pairs. We compare pairs of elements from the input first with each other, and then we
compare the smaller to the current minimum and the larger to the current maximum, at a cost
of 3 comparisons for every 2 elements.Setting up initial values for the current minimum and maximum depends on whether n is odd
or even. If n is odd, we set both the minimum and maximum to the value of the first element,
and then we process the rest of the elements in pairs. If n is even, we perform 1 comparison
on the first 2 elements to determine the initial values of the minimum and maximum, and then
process the rest of the elements in pairs as in the case for odd n.
Let us analyze the total number of comparisons. If n is odd, then we perform 3n/2
comparisons. If n is even, we perform 1 initial comparison followed by 3(n - 2)/2
comparisons, for a total of 3(n/2)-2. Thus, in either case, the total number of comparisons is at
most 3(n/2).
Therefore, regarding the number of comparisons the code would have to work fine.
- The subroutine don't check the input parameters. Infact, it fails with an input like this:
[qw//]
or
[qw/foo bar/]
Can someone suggest improvements to me?
Can some step of the code be written in a better way? In a more perlish way?
Consider that i would like to preserve the structure of the algorithm... and thanks to all in advance.
Re: minimum, maximum and average of a list of numbers at the same time
by broquaint (Abbot) on Nov 10, 2005 at 11:55 UTC
|
This would seem a "perlish" solution to me:
use List::Util qw/ min max sum /;
sub min_max_avg {
return min(@_), max(@_), sum(@_) / @_;
}
| [reply] [d/l] |
|
It does look temptingly simple doesn't it?
But you'd be iterating across the list three times - which isn't very efficient. You only really need to iterate across it once. Something like this (off the top of my head and untested).
# use 'mean' instead of 'avg' as it's unambiguous
sub min_max_mean {
my ($min, $max, $tot);
my $count = @_;
$min = $max = $tot = shift;
foreach (@_) {
$min = $_ if $_ < $min;
$max = $_ if $_ > $max;
$tot += $_;
}
return ($min, $max, $tot / $count);
}
--
< http://dave.org.uk>
"The first rule of Perl club is you do not talk about
Perl club." -- Chip Salzenberg
| [reply] [d/l] |
|
Reducing the number of iterations or comparisons by a constant number may make a performance difference, depending on what gyrations you have to go through to make the reduction, but you're not changing the order of complexity. A 3-N solution is the same order as a 3/2-N solution, is the same as a single-pass (N) solution. They're all O(n). This is an important concept.
If you can reduce the order of a solution, your solution will scale better. We commonly look for O(n log n) solutions to replace O(n2) solutions, so that working with large amounts of data doesn't make our app bog down. If you merely change by a constant factor (as we're talking about in this case), you may see a constant-factor improvement at any size, but you won't alleviate any scaling problems. You're in the realm of micro-optimization.
Walking through the list is not going to be a significant portion of the computation, compared to the comparisons and math being done on the variables of interest. And it's really not worth trying to take the elements two at a time, because that ends up being a very inefficient way to walk through the list, compared to Perl's built-in for.
If our OP were to translate the algorithm into Inline::C, the reduced number of comparisons might compete well with three List::Util calls, with the difference being some constant factor on any size list.
Caution: Contents may have been coded under pressure.
| [reply] [d/l] |
|
use List::Util qw/ reduce /;
sub min_max_avg {
$res= reduce {
my $r= ref $a ? $a : [($a) x 3];
if ($b < $r->[0]) {
$r->[0]= $b;
} elsif ($b > $r->[1]) {
$r->[1]= $b;
}
$r->[2]+=$b;
$r
} @_;
return @$res[0,1], $res->[2] / @_;
}
---
$world=~s/war/peace/g
| [reply] [d/l] |
|
And if you want to go even more FP...
sub Gen_Stats {
my $stat = {};
my ($cnt, $max, $min, $tot);
$stat->{ADD} = sub {
$cnt += @_;
for ( @_ ) {
$tot += $_;
$max = $_ if ! defined $max || $_ > $max;
$min = $_ if ! defined $min || $_ < $min;
}
};
$stat->{MAX} = sub { $max };
$stat->{MIN} = sub { $min };
$stat->{AVE} = sub { $cnt ? $tot / $cnt : undef };
$stat->{TOT} = sub { $tot };
$stat->{ADD}->( @_ );
return $stat;
}
my $stat_info = Gen_Stats();
while ( <DATA> ) {
chomp;
$stat_info->{ADD}($_);
}
print join "\t", map { $_->() } @{$stat_info}{qw/MAX MIN AVE TOT/};
This code was borrowed from RFC: Tool::Box. I recommend using List::Util when and wherever possible. It has been part of the core since 5.007003 and uses XS when possible. The only real limitation I see with it is that all the items in the list must be known at once as I pointed out in How A Function Becomes Higher Order.
| [reply] [d/l] |
|
Nice fast code, but does it return the right values?
I checked with the orig mma routine, the my_mma routine and your reduce_mma routine. Using the same set of @nums, the results differ.
use Benchmark qw(:all :hireswallclock);
my @nums = map {rand} 1 .. 10000;
my $subs = {
mma => sub { my @r = min_max_avg(\@nums) },
reduce_mma => sub { my @r = reduce_mma(\@nums) },
mymma => sub { my @r = my_mma(\@nums) }
};
cmpthese(-1, $subs);
$count = 1000;
for my $sub (keys %$subs){
$t = timeit($count, $subs->{$sub});
print "$count loops of $sub:",timestr($t),"\n";
}
for my $sub (keys %$subs){
print "$sub results:\n";
print (join ' ', &{$subs->{$sub}} , "\n");
}
##### Subs to test ######
__END__
$ perl benchmark.pl
Rate mma mymma reduce_mma
mma 86.3/s -- -49% -100%
mymma 171/s 98% -- -100%
reduce_mma 89229/s 103265% 52110% --
1000 loops of reduce_mma:0.0114148 wallclock secs ( 0.00 usr + 0.00 s
+ys = 0.00 CPU)
1000 loops of mma:12.0897 wallclock secs (12.05 usr + 0.01 sys = 12.0
+6 CPU) @ 82.91/s (n=1000)
1000 loops of mymma:5.98583 wallclock secs ( 5.97 usr + 0.00 sys = 5
+.97 CPU) @ 167.56/s (n=1000)
reduce_mma results:
0.956444677504809 0.209810070551129 0.765697545177442
mma results:
0.000119859684559742 0.999992750475673 0.505291691541093
mymma results:
0.000119859684559742 0.999992750475673 0.505291691541093
| [reply] [d/l] |
|
|
I'm not a perl expert... infact i'm not able to really understand the code of List::Util maybe you can clarify to me about this question: Do your code process three times the input list?
| [reply] |
|
Do your code process three times the input list?
Indeed it does and as davorg and demerphq have pointed out in their replies it isn't terribly efficient, but that's a "perlish" solution for you.
| [reply] |
|
But you're not required to understand it. But then of course without even having seen it I'm sure it does process it three times. If that is really relevant, then indeed you may want or have to process it once yourself. And if it is very relevant, perhaps you may want to do it as a binary extension.
Well, the author may consider a adding to List::Util a function to calculate more than one relevant quantity at a time, e.g.
my ($min, $max, $sum) = calculate [qw/min max sum/], @list;
or perhaps with an OO interface:
my $stats=List::Util->calculate([qw/min max sum/])->of(@list);
print $stats->min, "\n";
possibly even with a mixed interface:
calculate [qw/min max sum/], @list;
print List::Util::lastrun->max, "\n";
| [reply] [d/l] [select] |
|
I don't know what the programmatic performance would be, but using
Statistics::Descriptive
would certainly cut the amount of time you, the human,
had to spend on this problem.
| [reply] |
Re: minimum, maximum and average of a list of numbers at the same time
by BrowserUk (Patriarch) on Nov 10, 2005 at 12:51 UTC
|
As others have mentioned, you are unlikely to code anything in Perl that will beat List::Util for performance, even if you do avoid making 3 whole passes. That is the nature of a dynamic language.
For error handling, you need to add code to detect and deal with them. You have to decide whether to die or croak within the subroutine, or return some failure indication (like an empty list).
Here's one possibility that raises an exception (via die) within the sub and catches it in the calling code using the block form of eval:
#! perl -slw
use strict;
sub minMaxAve {
die 'Empty list' unless @_;
my( $min, $max, $sum );
for( @_ ) {
die 'Non-numeric value(s)' unless /^[\de+.-]+$/;
$min = $_ if !defined $min or $min > $_;
$max = $_ if !defined $max or $max < $_;
$sum += $_;
}
return $min, $max, $sum / @_;
}
for (
[],
[ 1 .. 4 ],
[ 5 , 1, 5, 6, 5 ],
[ 1, 2, -0.1, 1e-5, 2e10 ],
[ 'fox', 'dog' ]
) {
my @results = eval{ minMaxAve @$_ } or warn "[ @$_ ] : $@\n\n" and
+ next;
printf "[ %s ]\nmin: %g max: %g ave: %g\n\n", join( ', ',@$_ ), @r
+esults;
}
__END__
P:\test>junk2
[ ] : Empty list at P:\test\junk2.pl line 5.
[ 1, 2, 3, 4 ]
min: 1 max: 4 ave: 2.5
[ 5, 1, 5, 6, 5 ]
min: 1 max: 6 ave: 4.4
[ 1, 2, -0.1, 1e-005, 20000000000 ]
min: -0.1 max: 2e+010 ave: 4e+009
[ fox dog ] : Non-numeric value(s) at P:\test\junk2.pl line 8.
You should probably use Carp::croak rather than die, a better regex (say from Regexp::Common) or better, Scalar::Util::looks_like_number() to detect non-numeric values.
And, if performance is your requirement, List::Util's calls, but make sure that you get a version that compiles the XS alternatives on your platform.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
I would also add List::Moreutils, which has a minmax function using an improved algorithm to evaluate min and max at once:
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw( min max reduce );
use List::MoreUtils qw( minmax );
use Benchmark qw( cmpthese );
my @array = map { rand 100 } 1 .. 1_000_000;
my $count = 0;
cmpthese -10, {
util_min_max => sub {
my $min = min(@array);
my $max = max(@array);
},
util_reduce => sub {
my ( $min, $max ) = @{ (reduce {
my $r= ref $a ? $a : [($a) x 2];
if ($b < $r->[0]) {
$r->[0]= $b;
} elsif ($b > $r->[1]) {
$r->[1]= $b;
}
$r
} @array) };
},
moreutils_minmax => sub {
my ( $min, $max ) = minmax(@array);
},
};
__END__
Rate util_reduce util_min_max moreutils_m
+inmax
util_reduce 0.846/s -- -92%
+ -96%
util_min_max 10.5/s 1146% --
+ -45%
moreutils_minmax 19.2/s 2169% 82%
+ --
I added the reduce method because I did not know if it would perform better, but the result does not surprise me given the bunch of function calls that are involved. Anyway, this is what benchmarks are for :)
Flavio
perl -ple'$_=reverse' <<<ti.xittelop@oivalf
Don't fool yourself.
| [reply] [d/l] [select] |
Re: minimum, maximum and average of a list of numbers at the same time
by salva (Canon) on Nov 10, 2005 at 12:19 UTC
|
well, that algorithm could offer the best performance in assembler or C but I doubt it could make any difference in Perl.
Actually...
use Benchmark qw(cmpthese);
# insert your code here
sub my_mma {
my $r = shift;
if (@$r) {
my ($min, $max, $tot) = ($r->[0]) x 3;
for (@$r[1..$#$r]) {
$min = $_ if $_ < $min;
$max = $_ if $_ > $max;
$tot += $_;
}
return ($min, $max, $tot/@$r);
}
return (undef, undef, undef)
}
my @nums = map {rand} 1..1000;
cmpthese(-1, { ita => sub { my @r = min_max_avg(\@nums) },
simple => sub { my @r = my_mma(\@nums) } })
runs as...
Rate ita simple
ita 222/s -- -50%
simple 449/s 102% --
| [reply] [d/l] [select] |
|
This is not really a comparison of the algorithm, but of how to write effective Perl. The algorithm is trying to reduce the number of comparisons from 2 per element to 3 per 2 elements. However, your code differs from the OP in a couple key ways that improve the efficiency. First, the OP dereferences the array pointer for each element access. Second, the OP winds up copying elements into temporary variables to work two at a time, whereas your code uses the nice aliasing behavior of for to avoid having to create new variables. In Perl the overhead of the extra comparisons are minor relative to the cost of creating new temporary variables.
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
| [reply] [d/l] |
|
He also isn't saving as many comparisons as he thinks.
I've added some counts to the following version of his implementation of the algorithm and where using three passes on 100 elements you would expect 200 comparisions, his code does 198--for a saving of just 2--, but he also does 302 dereferences (which admitedly is traded for allocating stack space for the arg list) and a mod operation.
Maybe the algorithm can be implemented more economically, but quite where the saving comes in escapes me.
#! perl -slw
use strict;
my( $comps, $mods, $derefs ) = (0)x3;
sub min_max_avg {
my $ref = shift;
my ( $min, $max, $agv, $i );
my ( $current_min, $current_max );
$comps++; $derefs++; $mods++;
if ( @{$ref} % 2 == 0 ) {
$comps++; $derefs +=4;
( $min, $max ) =
$ref->[0] < $ref->[1]
? ( $ref->[0], $ref->[1] )
: ( $ref->[1], $ref->[0] );
$derefs += 2;
$agv = $ref->[0] + $ref->[1];
$i = 2;
}
else {
$derefs++;
$min = $max = $agv = $ref->[0];
$i = 1;
}
while ( $i < @{$ref} and ++$comps ) {
$comps++; $derefs += 4;
( $current_min, $current_max ) =
$ref->[$i] < $ref->[ $i + 1 ]
? ( $ref->[$i], $ref->[ $i + 1 ] )
: ( $ref->[ $i + 1 ], $ref->[$i] );
$comps++;
$min = $current_min if ( $current_min < $min );
$comps++;
$max = $current_max if ( $current_max > $max );
$derefs += 2;
$agv += $ref->[$i] + $ref->[ $i + 1 ];
$i += 2;
}
$derefs++;
return ( $min, $max, $agv / @{$ref} );
}
my( $min, $max, $ave ) = min_max_avg [ 1 .. 100 ];;
print "$min $max $ave";
print "comps:$comps derefs: $derefs mods: $mods";
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
|
|
This is not really a comparison of the algorithm, but of how to write effective Perl. The algorithm is trying to reduce the number of comparisons from 2 per element to 3 per 2 elements.
right, but my point is that trying to minimimize the number of comparisons in order to find the best algorithm is senseless when programing in perl where other operations like assignement or branching are equally (or more) expensive.
Actually, I doubt that even when implemented in assembler or C this algorithm is faster than the simple one in all the moderm processors where comparisons are not specially expensive.
| [reply] |
|
Thank you xdg, you found some sense for my benchmark! It does not suffer for all the optimisations salva cleverly used - which shows that sometimes a worse coder can do a cleaner work :)
Flavio
perl -ple'$_=reverse' <<<ti.xittelop@oivalf
Don't fool yourself.
| [reply] |
|
I've taken a cut at trying to get as perlish as possible with algorithm, eliminating as much overhead as I could quickly think of (and keeping similar approaches on dereferencing, etc.). The simple approach is still faster in pure perl -- doing the array overhead manually with splice versus using for still swamps the comparision savings.
Rate algorithm simple
algorithm 1517/s -- -30%
simple 2177/s 44% --
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
| [reply] [d/l] [select] |
|
thank you xdg, your reply helps me to understand what is the problem of my approach...
| [reply] |
|
I had similar results even with longer arrays:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw( cmpthese );
my @array = map { int rand 100 } 1 .. 1_000_000;
cmpthese -10, {
op => sub {
my ($min, $max, $avg) = min_max_avg(\@array);
},
simple => sub {
my ($min, $max, $avg) = min_max_avg_simple(\@array);
},
};
sub min_max_avg_simple {
my $ref = shift;
my $n_elements = scalar @$ref;
return unless $n_elements > 0;
my ( $min, $max, $sum ) = ($ref->[0]) x 3;
my $i = 1; # Start from next element
while ($i < $n_elements) {
my $value = $ref->[$i];
if ($value < $min)
{ $min = $value }
elsif ($value > $max)
{ $max = $value }
$sum += $value;
++$i;
}
return ( $min, $max, $sum / $n_elements );
}
sub min_max_avg {
my $ref = shift;
my ( $min, $max, $agv, $i );
my ( $current_min, $current_max );
if ( @{$ref} % 2 == 0 ) {
( $min, $max ) =
$ref->[0] < $ref->[1]
? ( $ref->[0], $ref->[1] )
: ( $ref->[1], $ref->[0] );
$agv = $ref->[0] + $ref->[1];
$i = 2;
}
else {
$min = $max = $agv = $ref->[0];
$i = 1;
}
while ( $i < @{$ref} ) {
( $current_min, $current_max ) =
$ref->[$i] < $ref->[ $i + 1 ]
? ( $ref->[$i], $ref->[ $i + 1 ] )
: ( $ref->[ $i + 1 ], $ref->[$i] );
$min = $current_min if ( $current_min < $min );
$max = $current_max if ( $current_max > $max );
$agv += $ref->[$i] + $ref->[ $i + 1 ];
$i += 2;
}
return ( $min, $max, $agv / @{$ref} );
}
__END__
s/iter op simple
op 1.13 -- -37%
simple 0.707 60% --
Next time I'll wait a bit more before coding :)
Flavio
perl -ple'$_=reverse' <<<ti.xittelop@oivalf
Don't fool yourself.
| [reply] [d/l] |
Re: minimum, maximum and average of a list of numbers at the same time
by rev_1318 (Chaplain) on Nov 10, 2005 at 16:22 UTC
|
A lot of alternatives have been provided, but I would like to add one more: # no error checking...
sub min_max_avg {
my ($min, $max) = (sort {$a <=> $b} @_)[0,-1];
my $total; $total += $_ for @_;
return ($min, $max, $total/@_)
}
| [reply] [d/l] |
|
use List::Util qw( sum );
sub min_max_avg {
my ( $min, $max ) = ( sort { $a <=> $b } @_ )[ 0, -1 ];
return ( $min, $max, sum @_ );
}
Makeshifts last the longest. | [reply] [d/l] |
|
| [reply] [d/l] |
|
|
| [reply] |
|
|