Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^5: sorting and merging in perl

by Cristoforo (Curate)
on Jul 30, 2018 at 16:32 UTC ( #1219496=note: print w/replies, xml ) Need Help??


in reply to Re^4: sorting and merging in perl
in thread sorting and merging in perl

roboticus did identify the error in the code you posted.

The code you posted is (with some minor changes).

Split the line into 4 named variables instead of @row (makes the problem clearer).

Eliminate quoting keys to the hash when they exist of all word characters (so don't need to be quoted).

#!/usr/bin/perl use strict; use warnings; my %results; open my $fh, '<', 'file1.csv' or die $!; while ( <$fh> ) { chomp ; my ($a1, $b1, $actDt, $inactDt) = split /,/; if (exists $results{$a1,$b1} ) { if ($actDt < $results{$a1,$b1}{ACTDATE}) { $results{$a1,$b1}{ACTDATE} = $actDt; } if (!$inactDt || !$results{$a1,$b1}{INACTDATE}) { $results{$a1,$b1}{INACTDATE} = '' ; } elsif($inactDt > $results{$a1,$b1}{INACTDATE} ) { $results{$a1,$b1}{INACTDATE} = $inactDt; } } else { # Create new entry in hash $results{$a1,$b1} = { A1 => $a1, B1 => $b1, ACTDATE => $actDt, INACTDATE => $inactDt, }; } } foreach ( sort keys %results ) { my $a1 = $results{ $_ }{A1} ; my $b1 = $results{ $_ }{B1} ; my $actDt = $results{ $_ }{ACTDATE}; my $inactDt = $results{ $_ }{INACTDATE}; print "$a1,$b1,$actDt,$inactDt\n" ; }
The output gives erroneous results because you include non-contiguous records. The output with is:
7900724655,200906888,20180416,20180830 7900724655,200906889,20180601,20180728 7900724655,200906890,20180905, 7900724666,200906868,20180416,20180830 7900724666,200906869,20180601,20180728 7900724666,200906890,20180905,

To get the results you want doesn't require a hash.

#!/usr/bin/perl use strict; use warnings; open my $fh, '<', 'file1.csv' or die $!; my ($actDt, $inactDt, $count) = ('', '', 0); my $previous_key = ''; while (<$fh>) { chomp; my ($a1, $b1, $begin, $end) = split /,/; my $key = "$a1,$b1"; if ($previous_key eq $key) { $inactDt = $end; $count++; } else { # either reached the end of a record or this is the first r +ecord print join(",", $previous_key, $actDt, $inactDt), "\n" if $count > 1; $actDt = $begin; $inactDt = $end; $count = 1; } $previous_key = $key; } print join(",", $previous_key, $actDt, $inactDt), "\n" if $count > 1;
This gives the desired results:
7900724655,200906889,20180601,20180728 7900724655,200906890,20180905, 7900724666,200906869,20180601,20180728 7900724666,200906890,20180905,
The input file is:
7900724655,200906888,20180416,20180522 7900724655,200906889,20180601,20180720 7900724655,200906889,20180724,20180728 7900724655,200906888,20180730,20180830 7900724655,200906890,20180905,20180930 7900724655,200906890,20181005,20181030 7900724655,200906890,20181104, 7900724666,200906868,20180416,20180522 7900724666,200906869,20180601,20180720 7900724666,200906869,20180724,20180728 7900724666,200906868,20180730,20180830 7900724666,200906890,20180905,20180930 7900724666,200906890,20181005,20181030 7900724666,200906890,20181104,

Replies are listed 'Best First'.
Re^6: sorting and merging in perl
by Sekhar Reddy (Acolyte) on Aug 08, 2018 at 14:48 UTC

    Hi Athanasius,

    First of all thank you very much for your try. I tried with your code, but still that is also giving some incorrect results. Below fyi.

    Input data that i have considered below

    7900724666,200906888,20180416,20180522 7900724666,200906888,20180601,20180720 7900724666,200906888,20180406,20180411 7900724677,200906872,20180301,20180330 7900724677,200906871,20180101,20180228 7900724677,200906873,20180401,20180420 7900724688,200906881,20180101,20180228 7900724688,200906881,20180303,20180330 7900724688,200906882,20180404,20180430 7900724688,200906883,20180508,20180620 7900724699,200906891,20180101,20180228 7900724699,200906891,20180303,20180330 7900724699,200906892,20180404,20180430 7900724699,200906893,20180508, 7900724611,200906888,20180416,20180522 7900724611,200906889,20180724,20180728 7900724611,200906889,20180601,20180720 7900724611,200906888,20180730,20180830 7900724611,200906890,20180905,20180930 7900724611,200906890,20181005,20181030 7900724611,200906890,20181104, 7900724622,200906868,20180416,20180522 7900724622,200906869,20180601,20180720 7900724622,200906869,20180724,20180728 7900724622,200906868,20180730,20180830 7900724622,200906890,20180905,20180930 7900724622,200906890,20181005,20181030 7900724622,200906890,20181104,

    The output which i have got is below

    7900724666,200906888,20180416,20180411 7900724688,200906881,20180101,20180330 7900724699,200906891,20180101,20180330 7900724611,200906889,20180724,20180720 7900724611,200906890,20180905, 7900724622,200906869,20180601,20180728 7900724622,200906890,20180905,

    Here 1st line and 4th line in output are incorrect: Ex:

    7900724611,200906889,20180724,20180720

    Here expected result in output 4th line is 7900724611,200906889,20180601,20180728

    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1219496]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2020-10-01 15:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (15 votes). Check out past polls.

    Notices?