Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

compare data between two files using Perl

by steveb94553 (Initiate)
on Jun 16, 2008 at 17:14 UTC ( [id://692315]=perlquestion: print w/replies, xml ) Need Help??

steveb94553 has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to compare some data between two files.
First I open a file (brd_sym_pn.txt) with data extracted from a database and pull the reference designator($RefDes), part number ($Pnum) and package type ($Pkg_Type).
Then I open another file (sym_text_latest.txt) and extract/assign the part number ($LogPnum) and package type ($LogPkg_Type).
I am trying to compare the $Pnum and $Pkg_Type for each $ RefDes between brd_sym_pn.txt and sym_text_latest.txt, reporting back if the $Pnum/ $pkg_Type does/does notmatches the $LogPnum/$LogPkg_Type assignment in sym_text_latest.txt.
It seems that I have too many loops and should be cycling through each reference designator or line of brd_sym_pn.txt.
I have tried to make this work for about 3-4 days and desperately need some help.

brd_sym_pn.txt
Begin
J2 12-0259-01 HDR-1X28-100-FLK-VT
P1 12-0258-01 HDR-2X12-118-VT end
J1 12-0259-01 HDR-1X2-100-FLK-VT
P2 12-0257-01 HDR-2X7-118-VT
MTG4 MTG_250_H115 MTG_250_H115
MTG5 MTG_250_H115 MTG_250_H115
P3 12-0255-01 RECPT-2X7-118-VT
MTG3 MTG_250_H115 MTG_250_H115
P4 12-0255-01 RECPT-2X7-118-VT
MTG2 MTG_250_H115 MTG_250_H115
MTG1 MTG_250_H115 MTG_250_H115
END
sym_text_latest.txt
Begin
Part number type mfg. P/n description pkg_type height mm > mil notes
12-0255-01 CON 44769-1403 "CONN,REC,14 PIN,THOLE,0.118 SPACE,VERT,RECPT-2X7-118-VT" RECPT-2X7-118-VT 13.6 535.432 new part number release
12-0256-01 CON 44769-1203 "CONN,REC,12 PIN,THOLE,0.118 SPACE,VERT,RECPT-2X6-118-VT" RECPT-2X6-118-VT 13.6 535.432 new part number release
12-0257-01 CON 43045-1414 "CONN,HDR,14 PIN,THOLE,0.118 SPACE,VERT,HDR-2X7-118-VT" HDR-2X7-118-VT 9.9 389.763 new part number release
12-0258-01 CON 43045-1214 "CONN,HDR,12 PIN,THOLE,0.118 SPACE,VERT,HDR-2X6-118-VT" HDR-2X6-118-VT 9.9 389.763 new part number release
12-0259-01 CON 22-29-2021 "CONN,HDR,2 PIN,THOLE,0.100 SPACE,VERT,HDR-1X2-100-FLK-VT" HDR-1X2-100-FLK-VT 11.7 460.629 new part number release
END
#Delete current temp_sym_pn.txt if exists. `del temp_ref_list.txt`; { open(brdpartlog, "brd_sym_pn.txt") || die("failed to open br +d_sym_pn.txt"); print "\n"; while($line = <brdpartlog>) { @fields = split(/\t/,$line); # if($fields[1] eq $Pnum) { our $RdesPnPkg = "$fields[0]\t$fields[1]\t$fields[2] +"; our $RdesPn = "\"$fields[0]\"\t\=>\t\"$fields[1]\", +"; our $BrdPnPkg = "$fields[1]\t$fields[2]"; our $RefDes = "$fields[0]"; our $Pnum = "$fields[1]"; our $Pkg_Type = "$fields[2]"; } %hash_ref_pn = ($RefDes, $Pnum); my @k = keys %hash_ref_pn; my @v = values %hash_ref_pn; print $k[0] ,"\t", $v[0], "\n"; # I thought this would assign $k to key in %hash_ +ref_pn # and would print each Key (reference designator) $k[0] = $Rf_Ds; print $Rf_Ds; # I thought this would assign $v to key in %hash_ +ref_pn # and would print each Key (Part_Number) $v = $Part_Number; print $Part_Number; #prints value (12-0259-01) of key J2 #print ($hash_ref_pn{J2}); #counts the number of keys included in %hash_ref +_pn my $count = keys %hash_ref_pn; #print $count; #print "\n"; #creates a string that resembles an associative hash #of reference designators #example: "J2" => "12-0259-01", "P1" => "12 +-0258-01", #Prints each reference designators on one line. #@RdesList=(); #@RdesList=($RefDes); open(Ref_List, ">>temp_ref_hash.txt"); print Ref_List "$RdesPn\t"; close(Ref_List); } open(Ref_List, "temp_ref_hash.txt"); @lines = <Ref_List>; $lines = $Pnum; close(Ref_List); #print @lines; my %hash_ref_pn = <Ref_List>; print ($hash_ref_pn{"J1"}); print "\n"; #print ("xxxxxxxxxx\n" x 3); print "$Pnum was the last assigned \$Pnum"; #Type out temp_ref_list.txt to screen, DOES NOT WORK??? #`Type c:\\work\\academy_x\\log_pn_check\\temp_ref_list.txt`; #exit; print "\n"; open(brdpartlog, "brd_sym_pn.txt") || die("failed to open br +d_sym_pn.txt"); print "\n"; while($line = <brdpartlog>) { @fields = split(/\t/,$line); if($fields[1] eq $Pnum) { our $RdesPnPkg = "$fields[0]\t$fields[1]\t$fields[2] +"; our $BrdPnPkg = "$fields[1]\t$fields[2]"; our $RefDes = "$fields[0]"; our $Pnum = "$fields[1]"; if($BrdPnPkg eq $LogPnPkg) { print("$RefDes\t$BrdPnPkg is the correct Allegro footp +rint.\n"); } else { print("$RefDes\t$BrdPnPkg should be using $LogPkg_Type +\n"); } } } open(partlog, "sym_text_latest.txt") || die("failed to open sym_ +text_latest.txt"); while($line = <partlog>) { @fields = split(/\t/,$line); if($fields[0] eq $Pnum) { our $LogPnPkg = "$fields[0]\t$fields[4]"; our $LogPnum = "$fields[0]"; our $LogPkg_Type = "$fields[4]"; } } }

Replies are listed 'Best First'.
Re: compare data between two files using Perl
by jettero (Monsignor) on Jun 16, 2008 at 17:17 UTC
    Take a look at Algorithm::Diff. It does a rather brilliant job at comparing text files and the best part is: it's written for you.

    -Paul

Re: compare data between two files using Perl
by pc88mxer (Vicar) on Jun 16, 2008 at 17:42 UTC
    You should only have to read each file once. From the way your describe your algorithm, you want to find things in one file but are not in another. The basic idea is this:
    my %seen; open(B, "brd_sym_pn.txt") or die "..."; while (<B>) { my ($RefDes, $Pnumm, $Pkg_Type) = ...parse these from the line... $seen{$RefDes, $Pnumm, $Pkg_Type} = 1; } close(B); open(S, "sym_text_latest.txt") or die "..."; while (<S>) { my ($RefDes, $Pnumm, $Pkg_Type) = ...parse these from the line... $seen{$Refdes, $Pnumm, $Pkg_Type} += 2; } close(S); while (my ($key, $val) = each %seen) { if ($val == 1) { # $key is in first file but not second } elsif ($val == 2) { # $key is in second file but not first } else { # key is in both files } }
    For info on how to interpret $key, have a look at the documentation for the $; variable in perldoc perlvar. For special cases you can optimize this code.
      Hi pc88mxer,
      Actually my goal is to take some data from brd_sym_pn.txt file. Specifically the $RefDes, $Pnum and $Pkg_Type.
      Then check that the "$Pnum and $Pkg_Type" for each "$RefDes" matches the assigned "$LogPnum" and "$LogPkg_Type" from sym_text_latest.txt.
      If a match is found it reports back whether the reference designator is using the correct $Pnum and $Pkg_Type.
      I have purpously have made some $Pkg_Type incorrect to check my script.
      #This program will return the refdes, part number and package type #tab delimeted for each instance from brd_sym_pn.txt (extracted from #a layout database) and part number and package type from sym_tezt_lat +est.txt #generated from sym_text_mmddyy.xls when part number is entered at <ST +DIN> prompt #Compares the pkg_type used on board with sym_text log files and repor +ts #whether the pkg_type is correct or reports what the correct pkg_type +should be. $Pnum = "12-0259-01"; print $Pnum; open(partlog, "sym_text_latest.txt") || die("failed to open sym_ +text_latest.txt"); while($line = <partlog>) { @fields = split(/\t/,$line); if($fields[0] eq $Pnum) { our $LogPnPkg = "$fields[0]\t$fields[4]"; our $LogPnum = "$fields[0]"; our $LogPkg_Type = "$fields[4]"; } } open(brdpartlog, "brd_sym_pn.txt") || die("failed to open brd_s +ym_pn.txt"); print "\n"; open(brdpartlog, "brd_sym_pn.txt") || die("failed to open br +d_sym_pn.txt"); print "\n"; while($line = <brdpartlog>) { @fields = split(/\t/,$line); if($fields[1] eq $Pnum) { our $RdesPnPkg = "$fields[0]\t$fields[1]\t$fields[2] +"; our $BrdPnPkg = "$fields[1]\t$fields[2]"; our $RefDes = "$fields[0]"; if($BrdPnPkg eq $LogPnPkg) { print("$RefDes\t$BrdPnPkg is the correct Allegro footp +rint.\n"); } else { print("$RefDes\t$BrdPnPkg should be using $LogPkg_Type +\n"); } } }
      prints results:
      J2 12-0259-01 HDR-1X28-100-FLK-VT should be using HDR-1X2-100-FLK-VT
      J1 12-0259-01 HDR-1X2-100-FLK-VT is the correct Allegro footprint.
        I think I understand what you are trying to do, and the approach I gave is a good start for conducting the analysis you want to perform.

        Suppose that the first file contains the following triples:

        RefDes Package PType R1 P1 T1 R2 P2 T2 ...
        and the second file contains
        RefDes Package PType R1 P1 NOT-T1 R2 P2 T2 R2 P2 ANOTHER-T2 ...
        The above algorithm will report that R1,P1,T1 appears in the first file but the not second and that the triple R1,P1,NOT-T1 appears in the second file but not the first. The interpretation of this is that NOT-T1 in the second file is a mistake and should be T1. We can modify the code to actually produce this message, but I just wanted to demonstrate how this situation is picked up by the algorithm.

        To take another example, consider the triples in each file that begin with R2,P2. The above algorithm will report that R2,P2,T2 appears in both files and that R2,P2,ANOTHER-T2 is in the second file but not the first. You have to decide how to interpret this situation. Perhaps it means that the second file is malformed because it contains two triples that begin with R2,P2.

        Again, you should only need to read your files once.

      Hey, I wanted to know more about: while (my ($key, $val) = each %seen) { how to interpret this. I looked at perlvar, not much idea. thanks, Hashmat
        each() will help you with your question. To put it short: each() returns a list of the next key/value pair and uses memory very efficiently.
        Regards,
        svenXY
Re: compare data between two files using Perl
by radiantmatrix (Parson) on Jun 16, 2008 at 20:40 UTC

    The solution you really want is a database. You can get a very lightweight one via the DBD::SQLite module (you'll also want DBI if you do anything with a database).

    You'll want to read your file in and store it in a database. I see that you have tab-separated files -- you probably would save yourself a lot of work by using Text::CSV_XS to parse those instead of doing it yourself.

    Then, a simple query to the database will find mismatches.

    Here's a general (not debugged) example:

    Of course, you could also simply store your first file in a hash, using partnums as keys -- that's just lest flexible in terms of answering other questions about your data.

    That should give you a fair number of ideas.

    <radiant.matrix>
    Ramblings and references
    “A positive attitude may not solve all your problems, but it will annoy enough people to make it worth the effort.” — Herm Albright
    I haven't found a problem yet that can't be solved by a well-placed trebuchet

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://692315]
Approved by jettero
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-03-29 06:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found