Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Combining Files using Hash

by chaney123 (Acolyte)
on Sep 05, 2017 at 04:34 UTC ( [id://1198675]=perlquestion: print w/replies, xml ) Need Help??

chaney123 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

So this is the files that i have. File 1 has like 400+ lines while file2 has only 200+ lines. I store column1 in file2 as key and the rest as values. I wanted to output the file as below by copying the lines that are present in file 2 and file 1.

File1: Item,Time,Pattern ,Attributes,From an_en11,200.00,{0 133},{ } br_gh13,140.09,{0 59},{ } ce_oy74,300.05,{0 230},{int_43} dt_pp50,200.11,{0 122},{ } er_tk02,305.47,{0 220},{ } ef_yb41,200.05,{0 233},{ } File2: Item,Sink,Buffer,Cell,Slew,Path,Violation,Area dt_pp50,0,0,2,0.000,0.000,0,0.000 er_tk02,0,2,3,0.002,0.004,0,0.001 ef_yb41,0,1,5,0.000,0.000,0,0.000 Output : Item,Sink,Buffer,Cell,Slew,Path,Violation,Area,Time dt_pp50,0,0,2,0.000,0.000,0,0.000,200.11 er_tk02,0,2,3,0.002,0.004,0, 0.001,305.47 ef_yb41,0,1,5,0.000,0.000,0,0.000,200.05

So this is the code that I had generate but when I run it, it says that the $keys,$value1,$value2 and $value3 are not initialized. Anyone know how to fix this?

my %file1Hash; my $value4; open my $file1, "<","design.rpt.csv" or die $!; open my $file2, "<","summary.rpt.csv"or die $!; open my $outfile_1, ">", "combined.rpt.csv" or die $!; while(<$file1>){ my($line) = $_; chomp $line; my($key,$value1,$value2,$value3) = $line =~ /(\w+),(\d+.\d+),(.\d+ +\s+\d+.\d+.)/g; $value4 = "$value1,$value2,$value3,"; push @{$file1Hash{$key}}, $value4; } while(<$file2>){ my ($line) = $_; chomp $line; my($key,$value1,$value2,$value3,$value4,$value5,$value6,$value7) = + $line =~ /(\w+|\S+),(\d+),(\d+),(\d+),(\d+.\d+),(\d+.\d+),(\d+),(\d+ +.\d+)/g ; if (exists $file1Hash{$key}) { print $file1Hash{$key}.",".$line."\n"; } else { # print $line."\n"; } } close $file1; close $file2; close $outfile_1; exit 0;

I don't know what is the problem and how to solve this. Please Help.

Replies are listed 'Best First'.
Re: Combining Files using Hash
by kcott (Archbishop) on Sep 05, 2017 at 06:18 UTC

    G'day chaney123,

    The reason you're getting a warning is because the regex isn't matching. You're sort of on the right track with the construct you used:

    $ perl -wE 'my ($x, $y) = "AB" =~ /(.)(.)/; say "$x $y"' A B

    However, if the regex doesn't match, you won't populate those variables:

    $ perl -wE 'my ($x, $y) = "AB" =~ /(.)(.)./; say "$x $y"' Use of uninitialized value $x in concatenation (.) or string at -e lin +e 1. Use of uninitialized value $y in concatenation (.) or string at -e lin +e 1.

    There are some other issues; however, the main one is that a regex is not the right tool for this job — Text::CSV is what you should be using when working with CSV data (if you also have Text::CSV_XS installed, it will run faster).

    Here's a quick example I knocked up to show you how much simpler it is when you use the correct tool for the job.

    #!/usr/bin/env perl -l use strict; use warnings; use Text::CSV; use Inline::Files; my %data; my $csv = Text::CSV::->new; while (my $row = $csv->getline(\*FILE2)) { $data{$row->[0]} = [ @$row[1..$#$row] ]; } while (my $row = $csv->getline(\*FILE1)) { next unless exists $data{$row->[0]}; push @{$data{$row->[0]}}, $row->[1]; } $csv->print(\*STDOUT, [$_, @{$data{$_}}]) for sort keys %data; __FILE1__ an_en11,200.00,{0 133},{ } br_gh13,140.09,{0 59},{ } ce_oy74,300.05,{0 230},{int_43} dt_pp50,200.11,{0 122},{ } er_tk02,305.47,{0 220},{ } ef_yb41,200.05,{0 233},{ } __FILE2__ dt_pp50,0,0,2,0.000,0.000,0,0.000 er_tk02,0,2,3,0.002,0.004,0,0.001 ef_yb41,0,1,5,0.000,0.000,0,0.000

    Output:

    dt_pp50,0,0,2,0.000,0.000,0,0.000,200.11 ef_yb41,0,1,5,0.000,0.000,0,0.000,200.05 er_tk02,0,2,3,0.002,0.004,0,0.001,305.47

    For demonstration purposes, I've used Inline::Files. You'll want to replace \*FILE1, \*FILE2 and \*STDOUT with your two input and one output filehandles. You're using open correctly, but I would recommend that the names for the filehandles reflect what they really are: $file1, for instance, suggests an actual file (i.e. it holds a filename); $in1_fh is more descriptive (although, the choice of name is entirely yours). I'd also recommend you either: improve the die messages (which are pretty crappy as currently shown), which is more work for you and prone to error; or you remove them and get Perl to do the work for you with the autodie pragma (which would be my choice).

    When you do actually have a reason to use a regex, I'd suggest you first read "perlretut - Perl regular expressions tutorial". The fact that you've used the 'g' modifier twice, when it wasn't appropriate, suggests that you're copying code from elsewhere without understanding what it does; even if that's not the case, you still need to understand every piece of code you write — we're always happy to help if you can't work something out for yourself.

    — Ken

      Hi Ken,

      I tried using your way to output the result. However, it shows "Can't call method "getline" on an undefined value at reformat_rpt.pl". What does this mean?

      Thanks

        Please take a look at the "How do I post a question effectively?" guidelines.

        Posting the error message, without showing the code that generated it, is completely pointless. All anyone can do is guess! My guess is that you wrote code like this:

        $ perl -e 'use Text::CSV; my $csv; $csv->getline' Can't call method "getline" on an undefined value at -e line 1.

        Possibly a very poor guess, but at least I've replicated your reported error message and shown the code that caused it.

        — Ken

Re: Combining Files using Hash
by vinoth.ree (Monsignor) on Sep 05, 2017 at 05:20 UTC
    Hi

    pls use,

    use strict; use warnings; for better error.

    $value3 are not initialized. Anyone know how to fix this?

    my($key,$value1,$value2,$value3) = $line =~ /(\w+),(\d+.\d+),(.\d+\s+\ +d+.\d+.)/g;

    In regular expression you have only three group and assigning into four variables, so $value3 will be undef.

    Update:

    Here its the fixed code

    use strict; use warnings; my %file1Hash; my $value4; open my $file1, "<","file1.csv" or die $!; open my $file2, "<","file2.csv"or die $!; open my $outfile_1, ">", "combined.rpt.csv" or die $!; while( my $line = <$file1>){ chomp $line; my($key,$value1) = (split /,/, $line); $file1Hash{$key} = $value1; } while(my $line1 = <$file2>){ chomp $line1; my($key1) = (split /,/, $line1); if (exists $file1Hash{$key1}) { print $line1.",".$file1Hash{$key1}."\n"; } else { # print $line1."\n"; } } close $file1; close $file2; close $outfile_1; exit 0;

    All is well. I learn by answering your questions...
      Hi,

      I tried this and "Use of uninitialized value $key in exists at reformat_rpt.pl" and "Use of uninitialized value $key1 in exists at reformat__rpt.pl" are shown. Why does this happen?

      Thanks
        Why does this happen?

        Maybe you have some blank lines at the end of the file. Add a check like this

        while (my $line1 = <$file2>){ next unless ($line1 =~ /\S/); # skip empty lines chomp $line1;
        poj
Re: Combining Files using Hash
by dasgar (Priest) on Sep 05, 2017 at 05:47 UTC

    I haven't tested your code or looked closely at your regexes. But here are a few thoughts.

    I believe that the two lines where you are trying to parse a line from a file need to have the regex within parentheses. In other words, change these original lines:

    my($key,$value1,$value2,$value3) = $line =~ /(\w+),(\d+.\d+),(.\d+\s+\ +d+.\d+.)/g; my($key,$value1,$value2,$value3,$value4,$value5,$value6,$value7) = $li +ne =~ /(\w+|\S+),(\d+),(\d+),(\d+),(\d+.\d+),(\d+.\d+),(\d+),(\d+.\d+ +)/g ;

    to make them look like the following:

    my($key,$value1,$value2,$value3) = ($line =~ /(\w+),(\d+.\d+),(.\d+\s+ +\d+.\d+.)/g); my($key,$value1,$value2,$value3,$value4,$value5,$value6,$value7) = ($l +ine =~ /(\w+|\S+),(\d+),(\d+),(\d+),(\d+.\d+),(\d+.\d+),(\d+),(\d+.\d ++)/g);

    An easy debugging step is to print out the variables to verify that they are indeed holding what you think they are. Based on your description of the error message, it sounds like one of your regex lines isn't matching for all lines of the file (such as the header line or blank lines).

    In looking at your data, it looks like the files are in CSV format. Although it might be tempting to use split to parse those lines, I think it probably would be better to use something like Text::CSV to parse those files.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1198675]
Approved by vinoth.ree
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-26 00:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found