http://qs321.pair.com?node_id=494288

ch1 has asked for the wisdom of the Perl Monks concerning the following question:

I trying to join two files on a common field.
File one:
upc,qty
840197083284,2
840197083291,3
840197083307,21
840197083314,0
File two:
product_sku,size,color,upc
121,MD,Black,840197083277
121,LG,Black,840197083284
121,XL,Black,840197083291
121,2XL,Black,840197083307

I'm trying to join on upc.
I've read a number of post.... combining 2 files with a comon field which seems to be close but I can't make it work.

Update: I'm new to perl and to perlmonks... So, First thanks for all the help. Here is the code that worked for me.
open(I1,"Inventory.csv") or die "file1: $!"; $_ = <I1>; # read column headings while (<I1>) { # get data chomp; my ($upc,$qty) = split /,/; $data{$upc} = $qty; } open(I2,"Matrix.csv") or die "file2: $!"; $_ = <I2>; # read column headings while (<I2>) { # get data chomp; my ($sku,$siz,$clr,$upc) = split /,/; $data{$upc} .= join ',', '', $sku, $siz, $clr; } open(my $fh_out, '>', 'new.csv') or die("Can't open output file: $!\n"); for (sort keys %data) { my @content = "$data{$_},$_"; if (grep(/.*,.*,.*,.*$/, @content)){@column = grep(/.*,.*,.*,.*$/, + @content)}; #@column = grep(/.*,.*,.*,.*$/, @content); print $fh_out "@column\n"; }

Edit: g0n - Linkified PM link

Replies are listed 'Best First'.
Re: Joining two files on common field
by jZed (Prior) on Sep 22, 2005 at 20:38 UTC
    Well you didn't say what you want to do with the results, but assuming you want them in an Array of Arrays:
    use DBI; my $dbh=DBI->connect('dbi:CSV(RaiseError=1):'); my $results = $dbh->selectall_arrayref(" SELECT * FROM file1 NATURAL JOIN file2 ");
      Here is code but I'm getting an error.
      #!/usr/bin/perl -w use DBI; $dbh = DBI->connect("DBI:CSV:f_dir=/home/turn2sp/tmp") or die "Cannot connect: " . $DBI::errstr; $dbh->{'csv_tables'}->{'Matrix'} = { 'file' => 'Matrix.csv'}; $dbh->{'csv_tables'}->{'Inventory'} = { 'file' => 'Inventory.csv'}; $sth = $dbh->prepare("SELECT * from Matrix NATURAL JOIN Inventory"); #$sth = $dbh->prepare("SELECT * FROM Inventory"); $sth->execute(); while ( @row = $sth->fetchrow_array ) { print "@row\n"; }
      DBD::CSV::db prepare failed: Parse error near NATURAL JOIN Inventory at /usr/lib/perl5/site_perl/5.8.4/i686-linux/DBD/File.pm line 195. for Statement "SELECT * from Matrix NATURAL JOIN Inventory" at ./j line 7. Can't call method "execute" on an undefined value at ./j line 10.
        To get a better error message, change
        $sth = $dbh->prepare("SELECT * from Matrix NATURAL JOIN Inventory");
        to
        $sth = $dbh->prepare("SELECT * from Matrix NATURAL JOIN Inventory") or die "Cannot prepare: " . $DBI::errstr;
Re: Joining two files on common field
by ikegami (Patriarch) on Sep 22, 2005 at 20:40 UTC

    I believe you could use DBD::AnyData.

    Or, here's something based on bart's solution in the mentioned thread:

    use strict; use warnings; my %data; { open(my $fh_in, '<', 'file1.txt') or die("Can't open first input file: $!\n"); <$fh_in>; # Skip header. while (<$fh_in>) { chomp; my ($upc, @fields) = split /,/; push(@{$data{$key}}, @fields); } } { open(my $fh_in, '<', 'file2.txt') or die("Can't open second input file: $!\n"); <$fh_in>; # Skip header. while (<$fh_in>) { chomp; my @fields = split /,/; my $upc = pop(@fields); push(@{$data{$key}}, @fields); } } { open(my $fh_out, '>', 'file3.txt') or die("Can't open output file: $!\n"); foreach (sort keys %data) { print $fh_out (join(',', @$_), "\n"); } }
Re: Joining two files on common field
by davidrw (Prior) on Sep 22, 2005 at 21:59 UTC
    if it suits your needs, you can simply do this on the commandline with join as well:
    [davidrw@host davidrw]$ join -t , -1 1 -2 4 /tmp/f1 /tmp/f2 upc,qty,product_sku,size,color 840197083284,2,121,LG,Black 840197083291,3,121,XL,Black 840197083307,21,121,2XL,Black
    see "man join" for the various options to get the output the way you need it (espically if you want to show unmatched lines)
      I tried join first... well becasue it should have been easy but it returns no output.
        what were your attempts? I got nothing at first because i didn't specifiy the comma delimiter with -t (default is tab)... what i posted above worked and that was the actual output... worth a quick glance of the man page in case the params are slightly different for your system?
Re: Joining two files on common field
by graff (Chancellor) on Sep 22, 2005 at 22:57 UTC
    Yet another way -- just a plain hash:
    use strict; my %data; open(I1,"file1") or die "file1: $!" $_ = <I1>; # read column headings while (<I1>) { # get data chomp; my ($upc,$qty) = split /,/; $data{$upc} = $qty; } open(I2,"file2") or die "file2: $!" $_ = <I2>; # read column headings while (<I2>) { # get data chomp; my ($sku,$siz,$clr,$upc) = split /,/; $data{$upc} .= join ',', '', $sku, $siz, $clr; # or: $data{$upc} = join ',', $data{$upc},$sku,$siz,$clr; } for (sort keys %data) { print "$_,$data{$_}\n"; }
    That's pretty brittle in a number of ways: it won't handle properly quoted fields in a CSV file (e.g. '121,MD,"Green, Kelly",87654321'); it depends rigidly on a particular ordering and quantity of columns in each file; if file2 has a upc value not found in file1, it'll print ",$sku,$siz,$clr"; if file1 has a upc value not found in file2, it'll print "$upc,$qty".

    But it's simple and should do the job as you described it.

      You said: That's pretty brittle in a number of ways: ... it depends rigidly on a particular ordering and quantity of columns in each file

      So let's fix that.

      $_ = <I1>;   # read column headings
      becomes
      $_ = <I1>; # read column headings my @columns= split /,/;


      And when reading the columns, instead of
      my ($upc,$qty) = split /,/;
      and
      my ($sku,$siz,$clr,$upc) = split /,/;
      we put
      @line{@columns}= split /,/;
      (provided you have a my %line; defined)

      So in full:

      $\=~s;s*.*;q^|D9JYJ^^qq^\//\\\///^;ex;print
Re: Joining two files on common field
by graff (Chancellor) on Sep 24, 2005 at 14:44 UTC
    Regarding your update, where you show a modified version of the code I suggested, I should point out a minor (harmless) misconception in your final "for" loop to print the results. Here's your version of the for loop:
    for (sort keys %data) { my @content = "$data{$_},$_"; if (grep(/.*,.*,.*,.*$/, @content)){ @column = grep(/.*,.*,.*,.*$/, @content) }; #@column = grep(/.*,.*,.*,.*$/, @content); print $fh_out "@column\n"; }
    The second line assigns a single string to an array (@content). This is bad form, but perl will "do the right thing" and create the array with a single element, which is the string you supplied: "$data{$_},$_" .

    Next, the "if" condition returns true if the single array element happens to match your regex, where you appear to be testing whether a string contains at least three commas. If it matches, you use grep on the single element in @content to store the string in a new array "@column". (Looks like you have removed "use strict;" because @column appears to be working as an undeclared global -- bad idea.) Anyway, if the match fails, @column remains undefined. UPDATE: rather, @column remains unchanged -- it still holds whatever may have been assigned to it previously, if anything. That could cause a problem in your output.

    Regardless of what happens with that if condition, you print @column followed by a line-feed (even if it's undefined --update: or contains a string from a previous iteration).

    All of that would be better stated as follows:

    for ( sort keys %data ) { print "$data{$_},$_\n" if ( $data{$_} =~ /,.*?,/ ); }
    (that is, if the original value of "$data{$_}" contains two commas, add ",$_\n" at the end and print it; otherwise skip it)
Re: Joining two files on common field
by samtregar (Abbot) on Sep 22, 2005 at 20:33 UTC
    Should we try to guess what you might be doing wrong or do you want to show us your code?

    -sam