Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

minimum, maximum and average of a list of numbers at the same time

by LucaPette (Friar)
on Nov 10, 2005 at 11:42 UTC ( [id://507340]=perlquestion: print w/replies, xml ) Need Help??

LucaPette has asked for the wisdom of the Perl Monks concerning the following question:

I have written the following subroutine to find minimum, maximum and average at the same time:
sub min_max_avg { my $ref = shift; my ( $min, $max, $agv, $i ); my ( $current_min, $current_max ); if ( @{$ref} % 2 == 0 ) { ( $min, $max ) = $ref->[0] < $ref->[1] ? ( $ref->[0], $ref->[1] ) : ( $ref->[1], $ref->[0] ); $agv = $ref->[0] + $ref->[1]; $i = 2; } else { $min = $max = $agv = $ref->[0]; $i = 1; } while ( $i < @{$ref} ) { ( $current_min, $current_max ) = $ref->[$i] < $ref->[ $i + 1 ] ? ( $ref->[$i], $ref->[ $i + 1 ] ) : ( $ref->[ $i + 1 ], $ref->[$i] ); $min = $current_min if ( $current_min < $min ); $max = $current_max if ( $current_max > $max ); $agv += $ref->[$i] + $ref->[ $i + 1 ]; $i += 2; } return ( $min, $max, $agv / @{$ref} ); }

Note that:

  • From well known "Introduction to algorithms, 2nd Edition" (ISBN 0070131511), page 162:
    ...at most 3(n/2) comparisons are sufficient to find both the minimum and the maximum. The strategy is to maintain the minimum and maximum elements seen thus far. Rather than processing each element of the input by comparing it against the current minimum and maximum, at a cost of 2 comparisons per element, we process elements in pairs. We compare pairs of elements from the input first with each other, and then we compare the smaller to the current minimum and the larger to the current maximum, at a cost of 3 comparisons for every 2 elements.Setting up initial values for the current minimum and maximum depends on whether n is odd or even. If n is odd, we set both the minimum and maximum to the value of the first element, and then we process the rest of the elements in pairs. If n is even, we perform 1 comparison on the first 2 elements to determine the initial values of the minimum and maximum, and then process the rest of the elements in pairs as in the case for odd n. Let us analyze the total number of comparisons. If n is odd, then we perform 3n/2 comparisons. If n is even, we perform 1 initial comparison followed by 3(n - 2)/2 comparisons, for a total of 3(n/2)-2. Thus, in either case, the total number of comparisons is at most 3(n/2).
    Therefore, regarding the number of comparisons the code would have to work fine.

  • The subroutine don't check the input parameters. Infact, it fails with an input like this:
    [qw//]

    or
    [qw/foo bar/]

Can someone suggest improvements to me?
Can some step of the code be written in a better way? In a more perlish way?
Consider that i would like to preserve the structure of the algorithm... and thanks to all in advance.

Replies are listed 'Best First'.
Re: minimum, maximum and average of a list of numbers at the same time
by broquaint (Abbot) on Nov 10, 2005 at 11:55 UTC
    This would seem a "perlish" solution to me:
    use List::Util qw/ min max sum /; sub min_max_avg { return min(@_), max(@_), sum(@_) / @_; }
    HTH

    _________
    broquaint

      It does look temptingly simple doesn't it?

      But you'd be iterating across the list three times - which isn't very efficient. You only really need to iterate across it once. Something like this (off the top of my head and untested).

      # use 'mean' instead of 'avg' as it's unambiguous sub min_max_mean { my ($min, $max, $tot); my $count = @_; $min = $max = $tot = shift; foreach (@_) { $min = $_ if $_ < $min; $max = $_ if $_ > $max; $tot += $_; } return ($min, $max, $tot / $count); }
      --
      <http://dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

        Reducing the number of iterations or comparisons by a constant number may make a performance difference, depending on what gyrations you have to go through to make the reduction, but you're not changing the order of complexity. A 3-N solution is the same order as a 3/2-N solution, is the same as a single-pass (N) solution. They're all O(n). This is an important concept.

        If you can reduce the order of a solution, your solution will scale better. We commonly look for O(n log n) solutions to replace O(n2) solutions, so that working with large amounts of data doesn't make our app bog down. If you merely change by a constant factor (as we're talking about in this case), you may see a constant-factor improvement at any size, but you won't alleviate any scaling problems. You're in the realm of micro-optimization.

        Walking through the list is not going to be a significant portion of the computation, compared to the comparisons and math being done on the variables of interest. And it's really not worth trying to take the elements two at a time, because that ends up being a very inefficient way to walk through the list, compared to Perl's built-in for.

        If our OP were to translate the algorithm into Inline::C, the reduced number of comparisons might compete well with three List::Util calls, with the difference being some constant factor on any size list.


        Caution: Contents may have been coded under pressure.

      And with List::Util::reduce()...

      use List::Util qw/ reduce /; sub min_max_avg { $res= reduce { my $r= ref $a ? $a : [($a) x 3]; if ($b < $r->[0]) { $r->[0]= $b; } elsif ($b > $r->[1]) { $r->[1]= $b; } $r->[2]+=$b; $r } @_; return @$res[0,1], $res->[2] / @_; }
      ---
      $world=~s/war/peace/g

        And if you want to go even more FP...
        sub Gen_Stats { my $stat = {}; my ($cnt, $max, $min, $tot); $stat->{ADD} = sub { $cnt += @_; for ( @_ ) { $tot += $_; $max = $_ if ! defined $max || $_ > $max; $min = $_ if ! defined $min || $_ < $min; } }; $stat->{MAX} = sub { $max }; $stat->{MIN} = sub { $min }; $stat->{AVE} = sub { $cnt ? $tot / $cnt : undef }; $stat->{TOT} = sub { $tot }; $stat->{ADD}->( @_ ); return $stat; } my $stat_info = Gen_Stats(); while ( <DATA> ) { chomp; $stat_info->{ADD}($_); } print join "\t", map { $_->() } @{$stat_info}{qw/MAX MIN AVE TOT/};
        This code was borrowed from RFC: Tool::Box. I recommend using List::Util when and wherever possible. It has been part of the core since 5.007003 and uses XS when possible. The only real limitation I see with it is that all the items in the list must be known at once as I pointed out in How A Function Becomes Higher Order.

        Cheers - L~R

        Nice fast code, but does it return the right values? I checked with the orig mma routine, the my_mma routine and your reduce_mma routine. Using the same set of @nums, the results differ.
        use Benchmark qw(:all :hireswallclock); my @nums = map {rand} 1 .. 10000; my $subs = { mma => sub { my @r = min_max_avg(\@nums) }, reduce_mma => sub { my @r = reduce_mma(\@nums) }, mymma => sub { my @r = my_mma(\@nums) } }; cmpthese(-1, $subs); $count = 1000; for my $sub (keys %$subs){ $t = timeit($count, $subs->{$sub}); print "$count loops of $sub:",timestr($t),"\n"; } for my $sub (keys %$subs){ print "$sub results:\n"; print (join ' ', &{$subs->{$sub}} , "\n"); } ##### Subs to test ###### __END__ $ perl benchmark.pl Rate mma mymma reduce_mma mma 86.3/s -- -49% -100% mymma 171/s 98% -- -100% reduce_mma 89229/s 103265% 52110% -- 1000 loops of reduce_mma:0.0114148 wallclock secs ( 0.00 usr + 0.00 s +ys = 0.00 CPU) 1000 loops of mma:12.0897 wallclock secs (12.05 usr + 0.01 sys = 12.0 +6 CPU) @ 82.91/s (n=1000) 1000 loops of mymma:5.98583 wallclock secs ( 5.97 usr + 0.00 sys = 5 +.97 CPU) @ 167.56/s (n=1000) reduce_mma results: 0.956444677504809 0.209810070551129 0.765697545177442 mma results: 0.000119859684559742 0.999992750475673 0.505291691541093 mymma results: 0.000119859684559742 0.999992750475673 0.505291691541093
      I'm not a perl expert... infact i'm not able to really understand the code of List::Util maybe you can clarify to me about this question: Do your code process three times the input list?
        Do your code process three times the input list?
        Indeed it does and as davorg and demerphq have pointed out in their replies it isn't terribly efficient, but that's a "perlish" solution for you.
        HTH

        _________
        broquaint

        But you're not required to understand it. But then of course without even having seen it I'm sure it does process it three times. If that is really relevant, then indeed you may want or have to process it once yourself. And if it is very relevant, perhaps you may want to do it as a binary extension.

        Well, the author may consider a adding to List::Util a function to calculate more than one relevant quantity at a time, e.g.

        my ($min, $max, $sum) = calculate [qw/min max sum/], @list;

        or perhaps with an OO interface:

        my $stats=List::Util->calculate([qw/min max sum/])->of(@list); print $stats->min, "\n";

        possibly even with a mixed interface:

        calculate [qw/min max sum/], @list; print List::Util::lastrun->max, "\n";
      I don't know what the programmatic performance would be, but using Statistics::Descriptive would certainly cut the amount of time you, the human, had to spend on this problem.
Re: minimum, maximum and average of a list of numbers at the same time
by BrowserUk (Patriarch) on Nov 10, 2005 at 12:51 UTC

    As others have mentioned, you are unlikely to code anything in Perl that will beat List::Util for performance, even if you do avoid making 3 whole passes. That is the nature of a dynamic language.

    For error handling, you need to add code to detect and deal with them. You have to decide whether to die or croak within the subroutine, or return some failure indication (like an empty list).

    Here's one possibility that raises an exception (via die) within the sub and catches it in the calling code using the block form of eval:

    #! perl -slw use strict; sub minMaxAve { die 'Empty list' unless @_; my( $min, $max, $sum ); for( @_ ) { die 'Non-numeric value(s)' unless /^[\de+.-]+$/; $min = $_ if !defined $min or $min > $_; $max = $_ if !defined $max or $max < $_; $sum += $_; } return $min, $max, $sum / @_; } for ( [], [ 1 .. 4 ], [ 5 , 1, 5, 6, 5 ], [ 1, 2, -0.1, 1e-5, 2e10 ], [ 'fox', 'dog' ] ) { my @results = eval{ minMaxAve @$_ } or warn "[ @$_ ] : $@\n\n" and + next; printf "[ %s ]\nmin: %g max: %g ave: %g\n\n", join( ', ',@$_ ), @r +esults; } __END__ P:\test>junk2 [ ] : Empty list at P:\test\junk2.pl line 5. [ 1, 2, 3, 4 ] min: 1 max: 4 ave: 2.5 [ 5, 1, 5, 6, 5 ] min: 1 max: 6 ave: 4.4 [ 1, 2, -0.1, 1e-005, 20000000000 ] min: -0.1 max: 2e+010 ave: 4e+009 [ fox dog ] : Non-numeric value(s) at P:\test\junk2.pl line 8.

    You should probably use Carp::croak rather than die, a better regex (say from Regexp::Common) or better, Scalar::Util::looks_like_number() to detect non-numeric values.

    And, if performance is your requirement, List::Util's calls, but make sure that you get a version that compiles the XS alternatives on your platform.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I would also add List::Moreutils, which has a minmax function using an improved algorithm to evaluate min and max at once:
      #!/usr/bin/perl use strict; use warnings; use List::Util qw( min max reduce ); use List::MoreUtils qw( minmax ); use Benchmark qw( cmpthese ); my @array = map { rand 100 } 1 .. 1_000_000; my $count = 0; cmpthese -10, { util_min_max => sub { my $min = min(@array); my $max = max(@array); }, util_reduce => sub { my ( $min, $max ) = @{ (reduce { my $r= ref $a ? $a : [($a) x 2]; if ($b < $r->[0]) { $r->[0]= $b; } elsif ($b > $r->[1]) { $r->[1]= $b; } $r } @array) }; }, moreutils_minmax => sub { my ( $min, $max ) = minmax(@array); }, }; __END__ Rate util_reduce util_min_max moreutils_m +inmax util_reduce 0.846/s -- -92% + -96% util_min_max 10.5/s 1146% -- + -45% moreutils_minmax 19.2/s 2169% 82% + --
      I added the reduce method because I did not know if it would perform better, but the result does not surprise me given the bunch of function calls that are involved. Anyway, this is what benchmarks are for :)

      Flavio
      perl -ple'$_=reverse' <<<ti.xittelop@oivalf

      Don't fool yourself.
Re: minimum, maximum and average of a list of numbers at the same time
by salva (Canon) on Nov 10, 2005 at 12:19 UTC
    well, that algorithm could offer the best performance in assembler or C but I doubt it could make any difference in Perl.

    Actually...

    use Benchmark qw(cmpthese); # insert your code here sub my_mma { my $r = shift; if (@$r) { my ($min, $max, $tot) = ($r->[0]) x 3; for (@$r[1..$#$r]) { $min = $_ if $_ < $min; $max = $_ if $_ > $max; $tot += $_; } return ($min, $max, $tot/@$r); } return (undef, undef, undef) } my @nums = map {rand} 1..1000; cmpthese(-1, { ita => sub { my @r = min_max_avg(\@nums) }, simple => sub { my @r = my_mma(\@nums) } })
    runs as...
    Rate ita simple ita 222/s -- -50% simple 449/s 102% --

      This is not really a comparison of the algorithm, but of how to write effective Perl. The algorithm is trying to reduce the number of comparisons from 2 per element to 3 per 2 elements. However, your code differs from the OP in a couple key ways that improve the efficiency. First, the OP dereferences the array pointer for each element access. Second, the OP winds up copying elements into temporary variables to work two at a time, whereas your code uses the nice aliasing behavior of for to avoid having to create new variables. In Perl the overhead of the extra comparisons are minor relative to the cost of creating new temporary variables.

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

        He also isn't saving as many comparisons as he thinks.

        I've added some counts to the following version of his implementation of the algorithm and where using three passes on 100 elements you would expect 200 comparisions, his code does 198--for a saving of just 2--, but he also does 302 dereferences (which admitedly is traded for allocating stack space for the arg list) and a mod operation.

        Maybe the algorithm can be implemented more economically, but quite where the saving comes in escapes me.

        #! perl -slw use strict; my( $comps, $mods, $derefs ) = (0)x3; sub min_max_avg { my $ref = shift; my ( $min, $max, $agv, $i ); my ( $current_min, $current_max ); $comps++; $derefs++; $mods++; if ( @{$ref} % 2 == 0 ) { $comps++; $derefs +=4; ( $min, $max ) = $ref->[0] < $ref->[1] ? ( $ref->[0], $ref->[1] ) : ( $ref->[1], $ref->[0] ); $derefs += 2; $agv = $ref->[0] + $ref->[1]; $i = 2; } else { $derefs++; $min = $max = $agv = $ref->[0]; $i = 1; } while ( $i < @{$ref} and ++$comps ) { $comps++; $derefs += 4; ( $current_min, $current_max ) = $ref->[$i] < $ref->[ $i + 1 ] ? ( $ref->[$i], $ref->[ $i + 1 ] ) : ( $ref->[ $i + 1 ], $ref->[$i] ); $comps++; $min = $current_min if ( $current_min < $min ); $comps++; $max = $current_max if ( $current_max > $max ); $derefs += 2; $agv += $ref->[$i] + $ref->[ $i + 1 ]; $i += 2; } $derefs++; return ( $min, $max, $agv / @{$ref} ); } my( $min, $max, $ave ) = min_max_avg [ 1 .. 100 ];; print "$min $max $ave"; print "comps:$comps derefs: $derefs mods: $mods";

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        This is not really a comparison of the algorithm, but of how to write effective Perl. The algorithm is trying to reduce the number of comparisons from 2 per element to 3 per 2 elements.

        right, but my point is that trying to minimimize the number of comparisons in order to find the best algorithm is senseless when programing in perl where other operations like assignement or branching are equally (or more) expensive.

        Actually, I doubt that even when implemented in assembler or C this algorithm is faster than the simple one in all the moderm processors where comparisons are not specially expensive.

        Thank you xdg, you found some sense for my benchmark! It does not suffer for all the optimisations salva cleverly used - which shows that sometimes a worse coder can do a cleaner work :)

        Flavio
        perl -ple'$_=reverse' <<<ti.xittelop@oivalf

        Don't fool yourself.

        I've taken a cut at trying to get as perlish as possible with algorithm, eliminating as much overhead as I could quickly think of (and keeping similar approaches on dereferencing, etc.). The simple approach is still faster in pure perl -- doing the array overhead manually with splice versus using for still swamps the comparision savings.

        Rate algorithm simple algorithm 1517/s -- -30% simple 2177/s 44% --

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

        thank you xdg, your reply helps me to understand what is the problem of my approach...
      I had similar results even with longer arrays:
      #!/usr/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); my @array = map { int rand 100 } 1 .. 1_000_000; cmpthese -10, { op => sub { my ($min, $max, $avg) = min_max_avg(\@array); }, simple => sub { my ($min, $max, $avg) = min_max_avg_simple(\@array); }, }; sub min_max_avg_simple { my $ref = shift; my $n_elements = scalar @$ref; return unless $n_elements > 0; my ( $min, $max, $sum ) = ($ref->[0]) x 3; my $i = 1; # Start from next element while ($i < $n_elements) { my $value = $ref->[$i]; if ($value < $min) { $min = $value } elsif ($value > $max) { $max = $value } $sum += $value; ++$i; } return ( $min, $max, $sum / $n_elements ); } sub min_max_avg { my $ref = shift; my ( $min, $max, $agv, $i ); my ( $current_min, $current_max ); if ( @{$ref} % 2 == 0 ) { ( $min, $max ) = $ref->[0] < $ref->[1] ? ( $ref->[0], $ref->[1] ) : ( $ref->[1], $ref->[0] ); $agv = $ref->[0] + $ref->[1]; $i = 2; } else { $min = $max = $agv = $ref->[0]; $i = 1; } while ( $i < @{$ref} ) { ( $current_min, $current_max ) = $ref->[$i] < $ref->[ $i + 1 ] ? ( $ref->[$i], $ref->[ $i + 1 ] ) : ( $ref->[ $i + 1 ], $ref->[$i] ); $min = $current_min if ( $current_min < $min ); $max = $current_max if ( $current_max > $max ); $agv += $ref->[$i] + $ref->[ $i + 1 ]; $i += 2; } return ( $min, $max, $agv / @{$ref} ); } __END__ s/iter op simple op 1.13 -- -37% simple 0.707 60% --
      Next time I'll wait a bit more before coding :)

      Flavio
      perl -ple'$_=reverse' <<<ti.xittelop@oivalf

      Don't fool yourself.
Re: minimum, maximum and average of a list of numbers at the same time
by rev_1318 (Chaplain) on Nov 10, 2005 at 16:22 UTC
    A lot of alternatives have been provided, but I would like to add one more:
    # no error checking... sub min_max_avg { my ($min, $max) = (sort {$a <=> $b} @_)[0,-1]; my $total; $total += $_ for @_; return ($min, $max, $total/@_) }

    Paul

      The logical conclusion:

      use List::Util qw( sum ); sub min_max_avg { my ( $min, $max ) = ( sort { $a <=> $b } @_ )[ 0, -1 ]; return ( $min, $max, sum @_ ); }

      Makeshifts last the longest.

        How is a O(NlogN) substitute for a O(2N) operation the "logical conclusion" of a thread that started out attempting to reduce the latter to O(3N/2)?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      using sort is in general a very bad idea because it has O(NlogN) complexity while the other algorithms proposed are just O(N).

      Though, for small lists it could be faster.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://507340]
Approved by polettix
Front-paged by pboin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-25 13:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found