Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Comparing two files line by line and exporting the differences from the first file

by Cristoforo (Curate)
on Jul 23, 2018 at 02:58 UTC ( #1219070=note: print w/replies, xml ) Need Help??


in reply to Comparing two files line by line and exporting the differences from the first file

jzelkowsz

To get the data in what I believe is the way it is probably presented, I removed the newline immediately following a comma. That way, the entire employee record is on one line. I did not parse the file using Text::CSV as I probably should've. Your data is as follows:

HR data

samaccountname,givenname,sn,initials,employeenumber,symphonyemployeety +pe,mail,title,department,company,l,physicaldeliveryoffice,streetaddre +ss,st,postalcode,telephonenumber,manager barsu991,Uttiam,Barski,K,20114598,IKP,Uttiam.Barski@pulse.org,Director + of Cooks,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker,WI +,50987,555-555-5555,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=ne +t" walkl003,Lreblemet,Walker,J,20178941,IKP,Lreblemet.Walker@pulse.org,He +ad Cook,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker,WI,5 +0987,555-555-5551,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" karss001,Sovyetk,Karsten,Y,20146598,IKP,Sovyetk.Karsten@pulse.org,Dish +washer,Day Kitchen,MILIFO,Alpena,Kitchen of the World,205 Willy B. Te +mple,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" zingk072,Kovon,Zingerman,K,20113578,IKP,Kovon.Zingerman@pulse.org,Bake +r,Day Kitchen,MILIFO,Alpena,Kitchen of the World,205 Willy B. Temple, +WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" peizs194,Synthia,Smite,B,20134743,IKP,Synthia.Peizer@pulse.org,Broiler + Man,Day Kitchen,MILIFO,Alpena,Kitchen of the World,205 Willy B. Temp +le,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" hutcy231,Yello,Hutchinson,W,20145712,IKP,Yello Hutchinson,@pulse.org,B +ottle Washer,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker +,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" haserz221,Zebediah,Haserkrilk,L,20125471,IKP,Zebediah.Haserkrilk@kit.o +rg,Purchaser,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker +,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"

AB data

samaccountname,givenname,sn,initials,employeenumber,symphonyemployeety +pe,mail,title,department,company,l,physicaldeliveryoffice,streetaddre +ss,st,postalcode,telephonenumber,manager barsu991,Uttiam,Barski,K,20114598,IKP,William.Barski@pulse.org,Chief o +f Cooks,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker,WI,5 +0987,555-555-5555,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" walkl003,Larry,Walker,J,,IKP,Larry.Walker@pulse.org,Cook,Day Kitchen,M +ILIFO,Alpena,Kitchen of the World,400 Baker,WI,50987,555-555-5551,"cn +=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" karss001,Steven,Karsten,Y,20146598,IKP,Steven.Karsten@pulse.org,Dishw, +Day Kitchen,MILIFO,Alpena,Sully's Kitchen,48720 Belcard,IL,34567,,"cn +=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" zingk072,Kevin,Zingerman,K,,,Kevin.Zingerman@pulse.org,Baker,Day Kitch +en,MILIFO,Alpena,Kitchen of the World,205 Willy B. Temple,WI,50987,,p +eizs194,Samantha,Smith,B,20134743,IKP,Samantha.Smith@pulse.org,"Man, +Broiler",Day Kitchen,MILIFO,Alpena,Kitchen of the World,205 Willy B. +Temple,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" hutcy231,Yaren,Hutchinson,W,20145712,IKP,Yaren Hutchinson,@pulse.org,B +ottle Washer,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker +,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" haserz221,Zebediah,Hasermann,L,,IKP,Zebediah.Haserman@pulse.org,Purcha +ser,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker,WI,50987 +,555-555-5555,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"

This solution assumes 'samaccountname' is correct between the 2 files and checks that the headers are the same for each file. Also I noticed there were some entries in the AD file that weren't in the HR file. I didn't try to compare since they weren't in the same file. I didn't know how you would handle this situation.

#!/usr/bin/perl use strict; use warnings; my $hr_file = 'HR.txt'; open my $fh, '<', $hr_file or die $!; my (undef, @hdr_hr) = split /,/, <$fh>; chomp @hdr_hr; my %hr_data; while (<$fh>) { chomp; my ($id, @rest) = split /,/; @{ $hr_data{$id} }{@hdr_hr} = @rest; } close $fh or die $!; my $ad_file = 'AD.txt'; open $fh, '<', $ad_file or die $!; my (undef, @hdr_ad) = split /,/, <$fh>; chomp @hdr_ad; @hdr_ad ~~ @hdr_hr or die "Uncompatible headers between HR and AD file +s\n"; my %ad_data; while (<$fh>) { chomp; my ($id, @rest) = split /,/; @{ $ad_data{$id} }{@hdr_ad} = @rest; } close $fh or die $!; for my $id (sort keys %hr_data) { next unless exists $ad_data{$id}; for my $hdr (@hdr_hr) { my $description_hr = $hr_data{$id}{$hdr}; my $description_ad = $ad_data{$id}{$hdr}; print "$id,$hdr,$description_hr\n" unless $description_hr eq $description_ad; } }

Output I got is:

barsu991,mail,Uttiam.Barski@pulse.org barsu991,title,Director of Cooks haserz221,sn,Haserkrilk haserz221,employeenumber,20125471 haserz221,mail,Zebediah.Haserkrilk@kit.org haserz221,telephonenumber, hutcy231,givenname,Yello hutcy231,mail,Yello Hutchinson karss001,givenname,Sovyetk karss001,mail,Sovyetk.Karsten@pulse.org karss001,title,Dishwasher karss001,physicaldeliveryoffice,Kitchen of the World karss001,streetaddress,205 Willy B. Temple karss001,st,WI karss001,postalcode,50987 walkl003,givenname,Lreblemet walkl003,employeenumber,20178941 walkl003,mail,Lreblemet.Walker@pulse.org walkl003,title,Head Cook zingk072,givenname,Kovon zingk072,employeenumber,20113578 zingk072,symphonyemployeetype,IKP zingk072,mail,Kovon.Zingerman@pulse.org zingk072,manager,"cn=manager1
  • Comment on Re: Comparing two files line by line and exporting the differences from the first file
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Comparing two files line by line and exporting the differences from the first file (updated)
by AnomalousMonk (Bishop) on Jul 23, 2018 at 03:18 UTC
    @hdr_ad ~~ @hdr_hr or die "Uncompatible headers between HR and AD files\n";

    The  ~~ smartmatch operator is discouraged from use in production code. | not encouraged for use in production code because it is "experimental." See Terminology in perlpolicy.


    Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1219070]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2020-09-27 04:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If at first I donít succeed, I Ö










    Results (142 votes). Check out past polls.

    Notices?