HOW to calculate the column data

xbmy has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone, I met a difficulty. The following data are fluxes' value of N2O emission, I want to calculate the average value of data in second column grouped by column one, because there are thousands of values for N2O-1 and N2O-2,N2O-3,N2O-4...... You know I am just a Perl newby, so can you show me the Perl code to solve the problem? DATA:

N2O-1 0.02
N2O-1 0.47
N2O-1 0.22
N2O-2 0.02
N2O-2 5.87
N2O-2 7.32
N2O-3 3.45
N2O-3 1.81
N2O-3 2.36
N2O-3 4.70
N2O-4 9.60
N2O-4 4.95
N2O-4 6.99
[download]

The result of the calculation should be following data:

N2O-1 0.297
N2O-2 4.403
N2O-3 2.54
N2O-4 7.18
[download]

I realy appreciated for your help! Thanks for your attention!

Comment on HOW to calculate the column data Select or Download Code

Replies are listed 'Best First'.
Re: HOW to calculate the column data by ikegami (Patriarch) on Dec 03, 2009 at 21:04 UTC
Group them using a hash `my %vals_by_type; while (<DATA>) { chomp; my ($type, $val) = split ' '; push @{ $vals_by_type{$type} }, $val; }` [download] Then average each type individually `use List::Util qw( sum ); for my $type (sort keys %vals_by_type) { my $vals = $vals_by_type{$type}; my $avg = sum( map $_/@$vals, @$vals ); printf("%s %.2f\n", $type, $avg); }` [download]	[reply] [d/l] [select]
Re^2: HOW to calculate the column data by jwkrahn (Abbot) on Dec 03, 2009 at 21:18 UTC
No need to use an array, you just need to store two values, the total and the count: `my %vals_by_type; while ( <DATA> ) { my ( $type, $val ) = split; $vals_by_type{ $type }{ total } += $val; $vals_by_type{ $type }{ count }++; } for my $type ( sort keys %vals_by_type ) { my $avg = $vals_by_type{ $type }{ total } / $vals_by_type{ $type } +{ count }; printf "%s %.2f\n", $type, $avg; }` [download]	[reply] [d/l]
Re^3: HOW to calculate the column data by ikegami (Patriarch) on Dec 03, 2009 at 21:26 UTC
The difference is that I divided each value before summing them for extra precision. But I'll grant you that it's surely not needed here. Storing them in an array is also useful if you want to perform more than one operation, especially if the operation requires all the elements (like finding the median). By the way, `$vals_by_type{ $type }{ count }++;` is less efficient than `++$vals_by_type{ $type }{ count };`	[reply] [d/l] [select]
Re: HOW to calculate the column data by Fletch (Bishop) on Dec 03, 2009 at 21:10 UTC
Since there's already been fish thrown . . . FORE! `perl -lane '/^N2O-(\d+)/;$t{$1}+=$F[-1];$c{$1}++;END{for(sort{$a<=>$b} +keys%t){printf"N2O-$_ %0.3f\n",$t{$_}/$c{$_}}}'` [download] The cake is a lie. The cake is a lie. The cake is a lie.	[reply] [d/l]
Re^2: HOW to calculate the column data by jwkrahn (Abbot) on Dec 03, 2009 at 21:33 UTC
If you just want to make it short: `perl -ane'$t{$F[0]}+=$F[1];$c{$F[0]}++}{printf"$_ %.3f\n",$t{$_}/$c{$_ +}for+sort+keys%t'` [download]	[reply] [d/l]
Re: HOW to calculate the column data by lostjimmy (Chaplain) on Dec 03, 2009 at 21:09 UTC
Sort of similar to ikegami's solution, but I calculate the sum on the fly instead of as a post-processing step. `my %values; while (<DATA>) { my ($col, $val) = split; # store the sum and number of occurrences in a hashref $values{$col}[0] += $val; $values{$col}[1]++; } for my $col (sort keys %values) { print "$col ", $values{$col}[0] / $values{$col}[1], "\n"; }` [download]	[reply] [d/l]
Re: HOW to calculate the column data by JadeNB (Chaplain) on Dec 03, 2009 at 21:02 UTC
Hmm, smells like homework …. What have you tried? Perl newbies become Perlmonks by experimenting. As a hint, you might want to try populating a hash, using `split` to get your hands on the proper keys and values.	[reply] [d/l]
Re: HOW to calculate the column data by AnomalousMonk (Archbishop) on Dec 04, 2009 at 01:26 UTC
Since the rest of the homework answer has been given, you should also be aware that the expected average values you give (or were given) for `N2O-1` and `N2O-3` are not correct.	[reply] [d/l] [select]
Re: HOW to calculate the column data by xbmy (Friar) on Dec 04, 2009 at 15:46 UTC
Thank you all!	[reply]

Back to Seekers of Perl Wisdom