Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: compare data between two files using Perl

by radiantmatrix (Parson)
on Jun 16, 2008 at 20:40 UTC ( #692362=note: print w/replies, xml ) Need Help??

in reply to compare data between two files using Perl

The solution you really want is a database. You can get a very lightweight one via the DBD::SQLite module (you'll also want DBI if you do anything with a database).

You'll want to read your file in and store it in a database. I see that you have tab-separated files -- you probably would save yourself a lot of work by using Text::CSV_XS to parse those instead of doing it yourself.

Then, a simple query to the database will find mismatches.

Here's a general (not debugged) example:

use strict; use warnings; use DBI; use DBD::SQLite; use IO::File; use Text::CSV_XS; my $db_file = 'ref_compare.db'; my $csv = Text::CSV_XS->new({sep_char=>"\t"}); ## remove the db file if it exists unlink $db_file if -f $db_file; my $dbh = DBI->connect("dbi:SQLite:dbname=$db_file",'',''); ## create two tables. ## 1: For brd_sym_pn $dbh->do(q' CREATE TABLE brd_sym_pn ( refdes TEXT, pnum TEXT, pkgtype TEXT ) '); ## 2: For sym_text_latest $dbh->do(q' CREATE TABLE sym_text_latest ( logpnpkg TEXT, logpnum TEXT, logpkgtype TEXT ) '); ## ok, now load brd_sym_pn my $sth = $dbh->prepare(q' INSERT INTO brd_sym_pn (refdes,pnum,pkgtype) VALUES (?,?,?) '); my $brd_sym_pn_io = IO::File->new('brd_sym_pn.txt'); ## use $brd_sym_pn_io->getline to skip any "header" rows until ( $brd_sym_pn_io->eof ) { my $values = $csv->getline( $brd_sym_pn_io ); # parse data line for ( @$values ) { s/^\s+|\s+$/ } # trim lead/trail whitespace $sth->execute( @$values ); # inserts row into DB table } ## ok, now load sym_text_latest $sth = $dbh->prepare(q' INSERT INTO sym_text_latest (logpnpkg,logpnum,logpkgtype) VALUES (?,?,?) '); my $sym_text_latest = IO::File->new('sym_text_latest.txt'); ## use $sym_text_latest->getline to skip any "header" rows until ( $sym_text_latest->eof ) { my $values = $csv->getline( $sym_text_latest ); # parse data line for ( @$values ) { s/^\s+|\s+$/ } # trim lead/trail whitespace $sth->execute( @$values ); # inserts row into DB table } ## now you can use any query you want, even in other scripts ## let's find everything where pnums match, but pkgtypes don't: $sth = $dbh->prepare(q' SELECT refdes, pnum, pkgtype, logpnum, logpkgtype FROM brd_sym_pn, sym_text_latest WHERE brd_sym_pn.pnum = sym_text_latest.logpnum AND brd_sym_pn.pkgtype != sym_text_latest.logpkgtype '); $sth->execute(); # print the results out. print join "\t", qw/refdes pnum pkgtype logpnum logpkgtype/; while ( my @row = $sth->fetchrow_array ) { print join "\t", @row; }

Of course, you could also simply store your first file in a hash, using partnums as keys -- that's just lest flexible in terms of answering other questions about your data.

That should give you a fair number of ideas.

Ramblings and references
“A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort.” Herm Albright
I haven't found a problem yet that can't be solved by a well-placed trebuchet

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://692362]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2021-04-17 00:26 GMT
Find Nodes?
    Voting Booth?

    No recent polls found