Combining Files using Hash

chaney123 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

So this is the files that i have. File 1 has like 400+ lines while file2 has only 200+ lines. I store column1 in file2 as key and the rest as values. I wanted to output the file as below by copying the lines that are present in file 2 and file 1.


File1:

Item,Time,Pattern ,Attributes,From
an_en11,200.00,{0 133},{ }
br_gh13,140.09,{0 59},{ }
ce_oy74,300.05,{0 230},{int_43}
dt_pp50,200.11,{0 122},{ }
er_tk02,305.47,{0 220},{ }
ef_yb41,200.05,{0 233},{ }

File2:

Item,Sink,Buffer,Cell,Slew,Path,Violation,Area 
dt_pp50,0,0,2,0.000,0.000,0,0.000
er_tk02,0,2,3,0.002,0.004,0,0.001
ef_yb41,0,1,5,0.000,0.000,0,0.000


Output :

Item,Sink,Buffer,Cell,Slew,Path,Violation,Area,Time
dt_pp50,0,0,2,0.000,0.000,0,0.000,200.11
er_tk02,0,2,3,0.002,0.004,0, 0.001,305.47
ef_yb41,0,1,5,0.000,0.000,0,0.000,200.05
[download]

So this is the code that I had generate but when I run it, it says that the $keys,$value1,$value2 and $value3 are not initialized. Anyone know how to fix this?

 
my %file1Hash;
my $value4;
    
open my $file1, "<","design.rpt.csv" or die $!; 
open my $file2, "<","summary.rpt.csv"or die $!; 
open my $outfile_1, ">", "combined.rpt.csv" or die $!;
    
while(<$file1>){
        
    my($line) = $_; 
    chomp $line; 
    my($key,$value1,$value2,$value3) = $line =~ /(\w+),(\d+.\d+),(.\d+
+\s+\d+.\d+.)/g; 
    $value4 = "$value1,$value2,$value3,";
    push @{$file1Hash{$key}}, $value4; 
        } 
        
while(<$file2>){ 
        
    my ($line) = $_; 
    chomp $line; 
    my($key,$value1,$value2,$value3,$value4,$value5,$value6,$value7) =
+ $line =~ /(\w+|\S+),(\d+),(\d+),(\d+),(\d+.\d+),(\d+.\d+),(\d+),(\d+
+.\d+)/g ; 
        
    if (exists $file1Hash{$key}) {
        
        print $file1Hash{$key}.",".$line."\n";
            
            } 
    else {
        #  print $line."\n";
            } 
} 

close $file1;
close $file2;
close $outfile_1;
    
exit 0;
[download]

I don't know what is the problem and how to solve this. Please Help.

Comment on Combining Files using Hash Select or Download Code

Replies are listed 'Best First'.
Re: Combining Files using Hash by kcott (Archbishop) on Sep 05, 2017 at 06:18 UTC
G'day chaney123, The reason you're getting a warning is because the regex isn't matching. You're sort of on the right track with the construct you used: `$ perl -wE 'my ($x, $y) = "AB" =~ /(.)(.)/; say "$x $y"' A B` [download] However, if the regex doesn't match, you won't populate those variables: `$ perl -wE 'my ($x, $y) = "AB" =~ /(.)(.)./; say "$x $y"' Use of uninitialized value $x in concatenation (.) or string at -e lin +e 1. Use of uninitialized value $y in concatenation (.) or string at -e lin +e 1.` [download] There are some other issues; however, the main one is that a regex is not the right tool for this job — Text::CSV is what you should be using when working with CSV data (if you also have Text::CSV_XS installed, it will run faster). Here's a quick example I knocked up to show you how much simpler it is when you use the correct tool for the job. #!/usr/bin/env perl -l use strict; use warnings; use Text::CSV; use Inline::Files; my %data; my $csv = Text::CSV::->new; while (my $row = $csv->getline(\FILE2)) { $data{$row->[0]} = [ @$row[1..$#$row] ]; } while (my $row = $csv->getline(\FILE1)) { next unless exists $data{$row->[0]}; push @{$data{$row->[0]}}, $row->[1]; } $csv->print(\STDOUT, [$_, @{$data{$_}}]) for sort keys %data; __FILE1__ an_en11,200.00,{0 133},{ } br_gh13,140.09,{0 59},{ } ce_oy74,300.05,{0 230},{int_43} dt_pp50,200.11,{0 122},{ } er_tk02,305.47,{0 220},{ } ef_yb41,200.05,{0 233},{ } __FILE2__ dt_pp50,0,0,2,0.000,0.000,0,0.000 er_tk02,0,2,3,0.002,0.004,0,0.001 ef_yb41,0,1,5,0.000,0.000,0,0.000 [download] Output: `dt_pp50,0,0,2,0.000,0.000,0,0.000,200.11 ef_yb41,0,1,5,0.000,0.000,0,0.000,200.05 er_tk02,0,2,3,0.002,0.004,0,0.001,305.47` [download] For demonstration purposes, I've used Inline::Files. You'll want to replace `\FILE1`, `\FILE2` and `\STDOUT` with your two input and one output filehandles. You're using open correctly, but I would recommend that the names for the filehandles reflect what they really are: `$file1`, for instance, suggests an actual file (i.e. it holds a filename); `$in1_fh` is more descriptive (although, the choice of name is entirely yours). I'd also recommend you either: improve the `die` messages (which are pretty crappy as currently shown), which is more work for you and prone to error; or you remove them and get Perl to do the work for you with the autodie pragma (which would be my choice). When you do actually have a reason to use a regex, I'd suggest you first read "perlretut - Perl regular expressions tutorial". The fact that you've used the '`g`' modifier twice, when it wasn't appropriate, suggests that you're copying code from elsewhere without understanding what it does; even if that's not the case, you still need to understand every piece of code you write — we're always happy to help if you can't work something out for yourself. — Ken	[reply] [d/l] [select]
Re^2: Combining Files using Hash by chaney123 (Acolyte) on Sep 05, 2017 at 07:04 UTC
Hi Ken, I tried using your way to output the result. However, it shows "Can't call method "getline" on an undefined value at reformat_rpt.pl". What does this mean? Thanks	[reply]
Re^3: Combining Files using Hash by kcott (Archbishop) on Sep 05, 2017 at 07:20 UTC
Please take a look at the "How do I post a question effectively?" guidelines. Posting the error message, without showing the code that generated it, is completely pointless. All anyone can do is guess! My guess is that you wrote code like this: `$ perl -e 'use Text::CSV; my $csv; $csv->getline' Can't call method "getline" on an undefined value at -e line 1.` [download] Possibly a very poor guess, but at least I've replicated your reported error message and shown the code that caused it. — Ken	[reply] [d/l]
Re: Combining Files using Hash by vinoth.ree (Monsignor) on Sep 05, 2017 at 05:20 UTC
Hi pls use, `use strict; use warnings; for better error.` *$value3 are not initialized. Anyone know how to fix this?* `my($key,$value1,$value2,$value3) = $line =~ /(\w+),(\d+.\d+),(.\d+\s+\ +d+.\d+.)/g;` [download] In regular expression you have only three group and assigning into four variables, so $value3 will be undef. Update: Here its the fixed code use strict; use warnings; my %file1Hash; my $value4; open my $file1, "<","file1.csv" or die $!; open my $file2, "<","file2.csv"or die $!; open my $outfile_1, ">", "combined.rpt.csv" or die $!; while( my $line = <$file1>){ chomp $line; my($key,$value1) = (split /,/, $line); $file1Hash{$key} = $value1; } while(my $line1 = <$file2>){ chomp $line1; my($key1) = (split /,/, $line1); if (exists $file1Hash{$key1}) { print $line1.",".$file1Hash{$key1}."\n"; } else { # print $line1."\n"; } } close $file1; close $file2; close $outfile_1; exit 0; [download] *All is well. I learn by answering your questions...*	[reply] [d/l] [select]
Re^2: Combining Files using Hash by chaney123 (Acolyte) on Sep 05, 2017 at 07:07 UTC
Hi, I tried this and "Use of uninitialized value $key in exists at reformat_rpt.pl" and "Use of uninitialized value $key1 in exists at reformat__rpt.pl" are shown. Why does this happen? Thanks	[reply]
Re^3: Combining Files using Hash by poj (Abbot) on Sep 05, 2017 at 07:42 UTC
Why does this happen? Maybe you have some blank lines at the end of the file. Add a check like this `while (my $line1 = <$file2>){ next unless ($line1 =~ /\S/); # skip empty lines chomp $line1;` [download] poj	[reply] [d/l]
Re: Combining Files using Hash by dasgar (Priest) on Sep 05, 2017 at 05:47 UTC
I haven't tested your code or looked closely at your regexes. But here are a few thoughts. I believe that the two lines where you are trying to parse a line from a file need to have the regex within parentheses. In other words, change these original lines: `my($key,$value1,$value2,$value3) = $line =~ /(\w+),(\d+.\d+),(.\d+\s+\ +d+.\d+.)/g; my($key,$value1,$value2,$value3,$value4,$value5,$value6,$value7) = $li +ne =~ /(\w+\|\S+),(\d+),(\d+),(\d+),(\d+.\d+),(\d+.\d+),(\d+),(\d+.\d+ +)/g ;` [download] to make them look like the following: `my($key,$value1,$value2,$value3) = ($line =~ /(\w+),(\d+.\d+),(.\d+\s+ +\d+.\d+.)/g); my($key,$value1,$value2,$value3,$value4,$value5,$value6,$value7) = ($l +ine =~ /(\w+\|\S+),(\d+),(\d+),(\d+),(\d+.\d+),(\d+.\d+),(\d+),(\d+.\d ++)/g);` [download] An easy debugging step is to print out the variables to verify that they are indeed holding what you think they are. Based on your description of the error message, it sounds like one of your regex lines isn't matching for all lines of the file (such as the header line or blank lines). In looking at your data, it looks like the files are in CSV format. Although it might be tempting to use split to parse those lines, I think it probably would be better to use something like Text::CSV to parse those files.	[reply] [d/l] [select]