http://qs321.pair.com?node_id=1219496


in reply to Re^4: sorting and merging in perl
in thread sorting and merging in perl

roboticus did identify the error in the code you posted.

The code you posted is (with some minor changes).

Split the line into 4 named variables instead of @row (makes the problem clearer).

Eliminate quoting keys to the hash when they exist of all word characters (so don't need to be quoted).

#!/usr/bin/perl use strict; use warnings; my %results; open my $fh, '<', 'file1.csv' or die $!; while ( <$fh> ) { chomp ; my ($a1, $b1, $actDt, $inactDt) = split /,/; if (exists $results{$a1,$b1} ) { if ($actDt < $results{$a1,$b1}{ACTDATE}) { $results{$a1,$b1}{ACTDATE} = $actDt; } if (!$inactDt || !$results{$a1,$b1}{INACTDATE}) { $results{$a1,$b1}{INACTDATE} = '' ; } elsif($inactDt > $results{$a1,$b1}{INACTDATE} ) { $results{$a1,$b1}{INACTDATE} = $inactDt; } } else { # Create new entry in hash $results{$a1,$b1} = { A1 => $a1, B1 => $b1, ACTDATE => $actDt, INACTDATE => $inactDt, }; } } foreach ( sort keys %results ) { my $a1 = $results{ $_ }{A1} ; my $b1 = $results{ $_ }{B1} ; my $actDt = $results{ $_ }{ACTDATE}; my $inactDt = $results{ $_ }{INACTDATE}; print "$a1,$b1,$actDt,$inactDt\n" ; }
The output gives erroneous results because you include non-contiguous records. The output with is:
7900724655,200906888,20180416,20180830 7900724655,200906889,20180601,20180728 7900724655,200906890,20180905, 7900724666,200906868,20180416,20180830 7900724666,200906869,20180601,20180728 7900724666,200906890,20180905,

To get the results you want doesn't require a hash.

#!/usr/bin/perl use strict; use warnings; open my $fh, '<', 'file1.csv' or die $!; my ($actDt, $inactDt, $count) = ('', '', 0); my $previous_key = ''; while (<$fh>) { chomp; my ($a1, $b1, $begin, $end) = split /,/; my $key = "$a1,$b1"; if ($previous_key eq $key) { $inactDt = $end; $count++; } else { # either reached the end of a record or this is the first r +ecord print join(",", $previous_key, $actDt, $inactDt), "\n" if $count > 1; $actDt = $begin; $inactDt = $end; $count = 1; } $previous_key = $key; } print join(",", $previous_key, $actDt, $inactDt), "\n" if $count > 1;
This gives the desired results:
7900724655,200906889,20180601,20180728 7900724655,200906890,20180905, 7900724666,200906869,20180601,20180728 7900724666,200906890,20180905,
The input file is:
7900724655,200906888,20180416,20180522 7900724655,200906889,20180601,20180720 7900724655,200906889,20180724,20180728 7900724655,200906888,20180730,20180830 7900724655,200906890,20180905,20180930 7900724655,200906890,20181005,20181030 7900724655,200906890,20181104, 7900724666,200906868,20180416,20180522 7900724666,200906869,20180601,20180720 7900724666,200906869,20180724,20180728 7900724666,200906868,20180730,20180830 7900724666,200906890,20180905,20180930 7900724666,200906890,20181005,20181030 7900724666,200906890,20181104,