http://qs321.pair.com?node_id=810938

xbmy has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone, I met a difficulty. The following data are fluxes' value of N2O emission, I want to calculate the average value of data in second column grouped by column one, because there are thousands of values for N2O-1 and N2O-2,N2O-3,N2O-4...... You know I am just a Perl newby, so can you show me the Perl code to solve the problem? DATA:
N2O-1 0.02 N2O-1 0.47 N2O-1 0.22 N2O-2 0.02 N2O-2 5.87 N2O-2 7.32 N2O-3 3.45 N2O-3 1.81 N2O-3 2.36 N2O-3 4.70 N2O-4 9.60 N2O-4 4.95 N2O-4 6.99
The result of the calculation should be following data:
N2O-1 0.297 N2O-2 4.403 N2O-3 2.54 N2O-4 7.18
I realy appreciated for your help! Thanks for your attention!

Replies are listed 'Best First'.
Re: HOW to calculate the column data
by ikegami (Patriarch) on Dec 03, 2009 at 21:04 UTC
    Group them using a hash
    my %vals_by_type; while (<DATA>) { chomp; my ($type, $val) = split ' '; push @{ $vals_by_type{$type} }, $val; }

    Then average each type individually

    use List::Util qw( sum ); for my $type (sort keys %vals_by_type) { my $vals = $vals_by_type{$type}; my $avg = sum( map $_/@$vals, @$vals ); printf("%s %.2f\n", $type, $avg); }

      No need to use an array, you just need to store two values, the total and the count:

      my %vals_by_type; while ( <DATA> ) { my ( $type, $val ) = split; $vals_by_type{ $type }{ total } += $val; $vals_by_type{ $type }{ count }++; } for my $type ( sort keys %vals_by_type ) { my $avg = $vals_by_type{ $type }{ total } / $vals_by_type{ $type } +{ count }; printf "%s %.2f\n", $type, $avg; }

        The difference is that I divided each value *before* summing them for extra precision. But I'll grant you that it's surely not needed here.

        Storing them in an array is also useful if you want to perform more than one operation, especially if the operation requires all the elements (like finding the median).

        By the way,
        $vals_by_type{ $type }{ count }++;
        is less efficient than
        ++$vals_by_type{ $type }{ count };

Re: HOW to calculate the column data
by Fletch (Bishop) on Dec 03, 2009 at 21:10 UTC

    Since there's already been fish thrown . . . FORE!

    perl -lane '/^N2O-(\d+)/;$t{$1}+=$F[-1];$c{$1}++;END{for(sort{$a<=>$b} +keys%t){printf"N2O-$_ %0.3f\n",$t{$_}/$c{$_}}}'

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      If you just want to make it short:

      perl -ane'$t{$F[0]}+=$F[1];$c{$F[0]}++}{printf"$_ %.3f\n",$t{$_}/$c{$_ +}for+sort+keys%t'
Re: HOW to calculate the column data
by lostjimmy (Chaplain) on Dec 03, 2009 at 21:09 UTC
    Sort of similar to ikegami's solution, but I calculate the sum on the fly instead of as a post-processing step.
    my %values; while (<DATA>) { my ($col, $val) = split; # store the sum and number of occurrences in a hashref $values{$col}[0] += $val; $values{$col}[1]++; } for my $col (sort keys %values) { print "$col ", $values{$col}[0] / $values{$col}[1], "\n"; }
Re: HOW to calculate the column data
by JadeNB (Chaplain) on Dec 03, 2009 at 21:02 UTC

    Hmm, smells like homework …. What have you tried? Perl newbies become Perlmonks by experimenting.

    As a hint, you might want to try populating a hash, using split to get your hands on the proper keys and values.

Re: HOW to calculate the column data
by AnomalousMonk (Archbishop) on Dec 04, 2009 at 01:26 UTC
    Since the rest of the homework answer has been given, you should also be aware that the expected average values you give (or were given) for  N2O-1 and  N2O-3 are not correct.
      Thank you all!