Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Joining two files on common field

by ch1 (Novice)
on Sep 22, 2005 at 20:29 UTC ( #494288=perlquestion: print w/replies, xml ) Need Help??

ch1 has asked for the wisdom of the Perl Monks concerning the following question:

I trying to join two files on a common field.
File one:
upc,qty
840197083284,2
840197083291,3
840197083307,21
840197083314,0
File two:
product_sku,size,color,upc
121,MD,Black,840197083277
121,LG,Black,840197083284
121,XL,Black,840197083291
121,2XL,Black,840197083307

I'm trying to join on upc.
I've read a number of post.... combining 2 files with a comon field which seems to be close but I can't make it work.

Update: I'm new to perl and to perlmonks... So, First thanks for all the help. Here is the code that worked for me.
open(I1,"Inventory.csv") or die "file1: $!"; $_ = <I1>; # read column headings while (<I1>) { # get data chomp; my ($upc,$qty) = split /,/; $data{$upc} = $qty; } open(I2,"Matrix.csv") or die "file2: $!"; $_ = <I2>; # read column headings while (<I2>) { # get data chomp; my ($sku,$siz,$clr,$upc) = split /,/; $data{$upc} .= join ',', '', $sku, $siz, $clr; } open(my $fh_out, '>', 'new.csv') or die("Can't open output file: $!\n"); for (sort keys %data) { my @content = "$data{$_},$_"; if (grep(/.*,.*,.*,.*$/, @content)){@column = grep(/.*,.*,.*,.*$/, + @content)}; #@column = grep(/.*,.*,.*,.*$/, @content); print $fh_out "@column\n"; }

Edit: g0n - Linkified PM link

Replies are listed 'Best First'.
Re: Joining two files on common field
by jZed (Prior) on Sep 22, 2005 at 20:38 UTC
    Well you didn't say what you want to do with the results, but assuming you want them in an Array of Arrays:
    use DBI; my $dbh=DBI->connect('dbi:CSV(RaiseError=1):'); my $results = $dbh->selectall_arrayref(" SELECT * FROM file1 NATURAL JOIN file2 ");
      Here is code but I'm getting an error.
      #!/usr/bin/perl -w use DBI; $dbh = DBI->connect("DBI:CSV:f_dir=/home/turn2sp/tmp") or die "Cannot connect: " . $DBI::errstr; $dbh->{'csv_tables'}->{'Matrix'} = { 'file' => 'Matrix.csv'}; $dbh->{'csv_tables'}->{'Inventory'} = { 'file' => 'Inventory.csv'}; $sth = $dbh->prepare("SELECT * from Matrix NATURAL JOIN Inventory"); #$sth = $dbh->prepare("SELECT * FROM Inventory"); $sth->execute(); while ( @row = $sth->fetchrow_array ) { print "@row\n"; }
      DBD::CSV::db prepare failed: Parse error near NATURAL JOIN Inventory at /usr/lib/perl5/site_perl/5.8.4/i686-linux/DBD/File.pm line 195. for Statement "SELECT * from Matrix NATURAL JOIN Inventory" at ./j line 7. Can't call method "execute" on an undefined value at ./j line 10.
        To get a better error message, change
        $sth = $dbh->prepare("SELECT * from Matrix NATURAL JOIN Inventory");
        to
        $sth = $dbh->prepare("SELECT * from Matrix NATURAL JOIN Inventory") or die "Cannot prepare: " . $DBI::errstr;
Re: Joining two files on common field
by ikegami (Patriarch) on Sep 22, 2005 at 20:40 UTC

    I believe you could use DBD::AnyData.

    Or, here's something based on bart's solution in the mentioned thread:

    use strict; use warnings; my %data; { open(my $fh_in, '<', 'file1.txt') or die("Can't open first input file: $!\n"); <$fh_in>; # Skip header. while (<$fh_in>) { chomp; my ($upc, @fields) = split /,/; push(@{$data{$key}}, @fields); } } { open(my $fh_in, '<', 'file2.txt') or die("Can't open second input file: $!\n"); <$fh_in>; # Skip header. while (<$fh_in>) { chomp; my @fields = split /,/; my $upc = pop(@fields); push(@{$data{$key}}, @fields); } } { open(my $fh_out, '>', 'file3.txt') or die("Can't open output file: $!\n"); foreach (sort keys %data) { print $fh_out (join(',', @$_), "\n"); } }
Re: Joining two files on common field
by davidrw (Prior) on Sep 22, 2005 at 21:59 UTC
    if it suits your needs, you can simply do this on the commandline with join as well:
    [davidrw@host davidrw]$ join -t , -1 1 -2 4 /tmp/f1 /tmp/f2 upc,qty,product_sku,size,color 840197083284,2,121,LG,Black 840197083291,3,121,XL,Black 840197083307,21,121,2XL,Black
    see "man join" for the various options to get the output the way you need it (espically if you want to show unmatched lines)
      I tried join first... well becasue it should have been easy but it returns no output.
        what were your attempts? I got nothing at first because i didn't specifiy the comma delimiter with -t (default is tab)... what i posted above worked and that was the actual output... worth a quick glance of the man page in case the params are slightly different for your system?
Re: Joining two files on common field
by graff (Chancellor) on Sep 22, 2005 at 22:57 UTC
    Yet another way -- just a plain hash:
    use strict; my %data; open(I1,"file1") or die "file1: $!" $_ = <I1>; # read column headings while (<I1>) { # get data chomp; my ($upc,$qty) = split /,/; $data{$upc} = $qty; } open(I2,"file2") or die "file2: $!" $_ = <I2>; # read column headings while (<I2>) { # get data chomp; my ($sku,$siz,$clr,$upc) = split /,/; $data{$upc} .= join ',', '', $sku, $siz, $clr; # or: $data{$upc} = join ',', $data{$upc},$sku,$siz,$clr; } for (sort keys %data) { print "$_,$data{$_}\n"; }
    That's pretty brittle in a number of ways: it won't handle properly quoted fields in a CSV file (e.g. '121,MD,"Green, Kelly",87654321'); it depends rigidly on a particular ordering and quantity of columns in each file; if file2 has a upc value not found in file1, it'll print ",$sku,$siz,$clr"; if file1 has a upc value not found in file2, it'll print "$upc,$qty".

    But it's simple and should do the job as you described it.

      You said: That's pretty brittle in a number of ways: ... it depends rigidly on a particular ordering and quantity of columns in each file

      So let's fix that.

      $_ = <I1>;   # read column headings
      becomes
      $_ = <I1>; # read column headings my @columns= split /,/;


      And when reading the columns, instead of
      my ($upc,$qty) = split /,/;
      and
      my ($sku,$siz,$clr,$upc) = split /,/;
      we put
      @line{@columns}= split /,/;
      (provided you have a my %line; defined)

      So in full:

      $\=~s;s*.*;q^|D9JYJ^^qq^\//\\\///^;ex;print
Re: Joining two files on common field
by samtregar (Abbot) on Sep 22, 2005 at 20:33 UTC
    Should we try to guess what you might be doing wrong or do you want to show us your code?

    -sam

Re: Joining two files on common field
by graff (Chancellor) on Sep 24, 2005 at 14:44 UTC
    Regarding your update, where you show a modified version of the code I suggested, I should point out a minor (harmless) misconception in your final "for" loop to print the results. Here's your version of the for loop:
    for (sort keys %data) { my @content = "$data{$_},$_"; if (grep(/.*,.*,.*,.*$/, @content)){ @column = grep(/.*,.*,.*,.*$/, @content) }; #@column = grep(/.*,.*,.*,.*$/, @content); print $fh_out "@column\n"; }
    The second line assigns a single string to an array (@content). This is bad form, but perl will "do the right thing" and create the array with a single element, which is the string you supplied: "$data{$_},$_" .

    Next, the "if" condition returns true if the single array element happens to match your regex, where you appear to be testing whether a string contains at least three commas. If it matches, you use grep on the single element in @content to store the string in a new array "@column". (Looks like you have removed "use strict;" because @column appears to be working as an undeclared global -- bad idea.) Anyway, if the match fails, @column remains undefined. UPDATE: rather, @column remains unchanged -- it still holds whatever may have been assigned to it previously, if anything. That could cause a problem in your output.

    Regardless of what happens with that if condition, you print @column followed by a line-feed (even if it's undefined --update: or contains a string from a previous iteration).

    All of that would be better stated as follows:

    for ( sort keys %data ) { print "$data{$_},$_\n" if ( $data{$_} =~ /,.*?,/ ); }
    (that is, if the original value of "$data{$_}" contains two commas, add ",$_\n" at the end and print it; otherwise skip it)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://494288]
Approved by samtregar
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2023-01-28 23:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?