Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

group by and sum for two columns

by gowthamvels (Novice)
on Aug 08, 2017 at 15:01 UTC ( [id://1196996]=perlquestion: print w/replies, xml ) Need Help??

gowthamvels has asked for the wisdom of the Perl Monks concerning the following question:

I have to do a group by and sum two columns, I am able to do only for two columns. For the below code, I got the help from Monks only. I need another help to group by and sum two columns. I have grouped by using col2, and sum col6 and col7

  • Input file
  • col1,col2,col3,col4,col4,col5,col6,col7
  • 1234,GP,20170715,0,V,97517,24,0.6
  • 5678,Pack,20170715,0,V,97516,88,1.8
  • 1234,GP,20170715,0,V,97517,22,0.6
  • 5678,Pack,20170715,0,V,97517,183,3.9
  • 1234,PRS,20170715,0,S,97517,261,5.4
  • 5678,PRS,20170715,0,M,97517,36,0.9
  • output should be:
  • GP,46,1.2
  • Pack,271,5.7
  • PRS,297,6.3
below is the code I have tried.
#!/usr/bin/perl #use strict; #use warnings; use feature 'say'; #my %hash; print "Please input the Service Name(V/M/S/ALL):"; my $sinput = <STDIN>; chomp $sinput; print "Please input On/Off/ALL:"; my $chcat= <STDIN>; chomp $chcat; print "Start Date:"; chomp($sdate=<>); print "End Date:"; chomp($edate=<>); $file = '/var/tmp/CA_scripts/1_onnet_offnet/Onnet_offnet_repository.cs +v'; open (my $fh, '>>', 'C_n_r_'.$sinput.'.csv'); unless(open FILE, $file) { die "\nUnable to open the file\n"; } while (my $line = <FILE>) { my ($col1,$col2,$col3,$col4,$col5,$col6,$col7,$col8) = split /,/, +$line; #my $col21 = $class; if ($sdate >= $col3 && $edate <= $col3 && $sinput eq $col5 && $chcat e +q "On" && ($col6 =~ /^\Q97517\E/ || $col6 =~ /^\Q97516\E/)) { $hash{"$col2"} += $col8; #$hash{"$col2"} += $col6; } elsif ($sdate >= $col3 && $edate <= $col3 && $sinput eq $col5 && $chca +t eq "Off" && $col6 =~ /^\Q97577\E/) { $hash{"$col2"} += $col8; } elsif ($sdate >= $col3 && $edate <= $col3 && $sinput eq "ALL" && $chc +at eq "ALL") { $hash{"$col2"} += $col8; } } foreach my $key (sort { $hash{$b} <=> $hash{$a} } keys %hash) { printf $fh "%s %1f\n", $key, $hash{$key};

Replies are listed 'Best First'.
Re: group by and sum for two columns
by hippo (Bishop) on Aug 08, 2017 at 15:24 UTC

    It is pretty hard to tell from the lack of formatting but it certainly looks to me like you have commented out strict and warnings. You ought to uncomment both of those and fix every last error and warning arising. That might help and it certainly won't be a hindrance.

Re: group by and sum for two columns
by 1nickt (Canon) on Aug 08, 2017 at 15:36 UTC

    In addition to the advice shared by hippo above about implementing strict and warnings before all else, you should also consider using a dedicated module to parse your CSV file, e.g. Text::CSV_XS. Or even better, since you are apparently wanting to filter on date range and so forth, use DBD::CSV and query the data using SQL, possibly using a query builder like SQL::Abstract.


    The way forward always starts with a minimal test.
Re: group by and sum for two columns
by kevbot (Vicar) on Aug 09, 2017 at 04:44 UTC

    Hello gowthamvels,

    When manipulating tabular data, I often like to use the Data::Table module. I noticed a potential problem with your input data. It had two columns named col4. So, I edited the header of your data such that the columns you want to calculate sums for are col7 and col8. If your data is in data.csv as follows,
    col1,col2,col3,col4,col5,col6,col7,col8 1234,GP,20170715,0,V,97517,24,0.6 5678,Pack,20170715,0,V,97516,88,1.8 1234,GP,20170715,0,V,97517,22,0.6 5678,Pack,20170715,0,V,97517,183,3.9 1234,PRS,20170715,0,S,97517,261,5.4 5678,PRS,20170715,0,M,97517,36,0.9
    then the following code will result in the output that you want.
    #!/usr/bin/env perl use strict; use warnings; use Data::Table; # Load input data from csv file my $dt = Data::Table::fromCSV('data.csv'); # Make a new table that only contains the relevant columns my $st = $dt->subTable(undef, [ 'col2', 'col7', 'col8' ]); # Group by 'col2', calculate sums for 'col7' and 'col8' my $ot = $st->group( ['col2'], # column to group by ['col7', 'col8'], # Columns to perform calculation on [ \&sum, \&sum ], # Apply sum function to 'col7' and 'col8' ['sum_of_col7', 'sum_of_col8'] # Put the sums in these columns ); print $ot->csv, "\n"; sub sum { my @data = @_; my $sum = 0; foreach my $x (@data) { next unless $x; $sum += $x; } return $sum; } exit;
    The output is
    col2,sum_of_col7,sum_of_col8 GP,46,1.2 Pack,271,5.7 PRS,297,6.3
      thanks a lot kevbot, I used this code and was successful. Thanks a lot to other monks, for helping me.
Re: group by and sum for two columns
by kcott (Archbishop) on Aug 09, 2017 at 09:40 UTC

    G'day gowthamvels,

    What you're after is a very simple exercise in using Text::CSV. Here's the guts of it:

    #!/usr/bin/env perl use strict; use warnings; use Data::Dump; use Text::CSV; my $csv = Text::CSV::->new; my %result; while (my $row = $csv->getline(\*DATA)) { $result{$row->[1]}[0] += $row->[6]; $result{$row->[1]}[1] += $row->[7]; } dd \%result; __DATA__ 1234,GP,20170715,0,V,97517,24,0.6 5678,Pack,20170715,0,V,97516,88,1.8 1234,GP,20170715,0,V,97517,22,0.6 5678,Pack,20170715,0,V,97517,183,3.9 1234,PRS,20170715,0,S,97517,261,5.4 5678,PRS,20170715,0,M,97517,36,0.9

    Output:

    { GP => [46, 1.2], Pack => [271, 5.7], PRS => [297, 6.3] }

    Please post data between <code>...</code> tags (as already requested in responses to this and your last post). Show the data as it appears in the file: do not make records into dot points or otherwise change their original format. Also check what you post, multiple times if necessary, using the "preview" button: in this instance, you posted "col1,...,col7" but there are eight columns, not seven.

    — Ken

Re: group by and sum for two columns
by AnomalousMonk (Archbishop) on Aug 08, 2017 at 15:32 UTC

    Please be aware that code tags on PerlMonks are paired HTML tags like  <code> ... some code here ... </code> and not  < ... a whole bunch of unreadable code here ... > Please see Markup in the Monastery and Writeup Formatting Tips.


    Give a man a fish:  <%-{-{-{-<

      aplogies, now I have formatted, since I am new to the forum, I dont know to format, now I have did it, Please help me . Thanks a lot.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1196996]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-03-28 16:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found