http://qs321.pair.com?node_id=1197067


in reply to group by and sum for two columns

Hello gowthamvels,

When manipulating tabular data, I often like to use the Data::Table module. I noticed a potential problem with your input data. It had two columns named col4. So, I edited the header of your data such that the columns you want to calculate sums for are col7 and col8. If your data is in data.csv as follows,
col1,col2,col3,col4,col5,col6,col7,col8 1234,GP,20170715,0,V,97517,24,0.6 5678,Pack,20170715,0,V,97516,88,1.8 1234,GP,20170715,0,V,97517,22,0.6 5678,Pack,20170715,0,V,97517,183,3.9 1234,PRS,20170715,0,S,97517,261,5.4 5678,PRS,20170715,0,M,97517,36,0.9
then the following code will result in the output that you want.
#!/usr/bin/env perl use strict; use warnings; use Data::Table; # Load input data from csv file my $dt = Data::Table::fromCSV('data.csv'); # Make a new table that only contains the relevant columns my $st = $dt->subTable(undef, [ 'col2', 'col7', 'col8' ]); # Group by 'col2', calculate sums for 'col7' and 'col8' my $ot = $st->group( ['col2'], # column to group by ['col7', 'col8'], # Columns to perform calculation on [ \&sum, \&sum ], # Apply sum function to 'col7' and 'col8' ['sum_of_col7', 'sum_of_col8'] # Put the sums in these columns ); print $ot->csv, "\n"; sub sum { my @data = @_; my $sum = 0; foreach my $x (@data) { next unless $x; $sum += $x; } return $sum; } exit;
The output is
col2,sum_of_col7,sum_of_col8 GP,46,1.2 Pack,271,5.7 PRS,297,6.3

Replies are listed 'Best First'.
Re^2: group by and sum for two columns
by gowthamvels (Novice) on Aug 10, 2017 at 14:45 UTC
    thanks a lot kevbot, I used this code and was successful. Thanks a lot to other monks, for helping me.