jzelkowsz has asked for the wisdom of the Perl Monks concerning the following question:
I have two files. One is an HR record of the user's values; the other is a network export of their attributes. I am trying to compare the two files and find the differences attribute by attribute. The sole reliable key is the samaccountname which is present and consistent in every record.
I am trying to produce a file like this:
barsu991,title,Director of Cooks
zingk072,symphonyemployeetype,IKP
zingk072,employeenumber,zingk072
zingk072,manager,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
Where each line of the produced file holds the samaccountname,attribute that is incorrect, and the correct value of the attribute from the HR record. One mistake per line.
I have tried to do this with loops like below but the best I get is a comparison with the last line and not all of them.
open(HR, "<hr.txt") || die "can't open hr";
open(AD, "<ad.txt") || die "can't open ad";
open(COMMAND, ">com.txt") || die "can't open com.txt";
while(<HR>)
{
($samaccountnameHR,$givennameHR,$snHR,$initialsHR,
$employeenumberHR,$symphonyemployeetypeHR,$mailHR,
$titleHR,$departmentHR,$companyHR,$lHR,
$physicaldeliveryofficeHR,$streetaddressHR,$stHR,
$postalcodeHR,$telephonenumberHR,$managerHR)=split(/,$/);
while(<AD>)
{
($samaccountnameAD,$givennameAD,$snAD,$initialsAD,$employeenumberAD,
$symphonyemployeetypeAD,$mailAD, $titleAD,$departmentAD,$companyAD,
$lAD,$physicaldeliveryofficeAD,$streetaddressAD,$stAD,$postalcodeAD,
$telephonenumberAD,$managerAD)=split(/,$/);
if ($employeenumberHR != $employeenumberAD)
{
print "$samaccountnameHR $samaccountnameAD\n";
}
}
}
HR Data:
samaccountname,givenname,sn,initials,employeenumber,
symphonyemployeetype,mail,title,department,company,l,
physicaldeliveryoffice,streetaddress,st,postalcode,
telephonenumber,manager
barsu991,Uttiam,Barski,K,20114598,IKP,
Uttiam.Barski@pulse.org,Director of Cooks,Day Kitchen,
MILIFO,Alpena,Kitchen of the World,400 Baker,WI,50987,
555-555-5555,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
walkl003,Lreblemet,Walker,J,20178941,IKP,
Lreblemet.Walker@pulse.org,Head Cook,Day Kitchen,MILIFO,Alpena,
Kitchen of the World,400 Baker,WI,50987,555-555-5551,
"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
karss001,Sovyetk,Karsten,Y,20146598,IKP,Sovyetk.Karsten@pulse.org,
Dishwasher,Day Kitchen,MILIFO,Alpena,Kitchen of the World,
205 Willy B. Temple,WI,50987,,
"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
zingk072,Kovon,Zingerman,K,20113578,IKP,Kovon.Zingerman@pulse.org,
Baker,Day Kitchen,MILIFO,Alpena,Kitchen of the World,
205 Willy B. Temple,WI,50987,,
"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
peizs194,Synthia,Smite,B,20134743,IKP,Synthia.Peizer@pulse.org,
Broiler Man,Day Kitchen,MILIFO,Alpena,
Kitchen of the World,205 Willy B. Temple,
WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
hutcy231,Yello,Hutchinson,W,20145712,IKP,
Yello Hutchinson,@pulse.org,
Bottle Washer,Day Kitchen,MILIFO,Alpena,
Kitchen of the World,400 Baker,WI,50987,
,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
haserz221,Zebediah,Haserkrilk,L,20125471,IKP,
Zebediah.Haserkrilk@kit.org,
Purchaser,Day Kitchen,MILIFO,Alpena,
Kitchen of the World,400 Baker,WI,50987,
,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
AD data:
samaccountname,givenname,sn,initials,employeenumber,
symphonyemployeetype,mail,title,department,company,l,
physicaldeliveryoffice,streetaddress,st,postalcode,
telephonenumber,manager
barsu991,Uttiam,Barski,K,20114598,IKP,
William.Barski@pulse.org,Chief of Cooks,Day Kitchen,
MILIFO,Alpena,Kitchen of the World,400 Baker,WI,50987,
555-555-5555,
"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
walkl003,Larry,Walker,J,,IKP,Larry.Walker@pulse.org,
Cook,Day Kitchen,MILIFO,Alpena,Kitchen of the World,
400 Baker,WI,50987,555-555-5551,
"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
karss001,Steven,Karsten,Y,20146598,IKP,
Steven.Karsten@pulse.org,Dishw,Day Kitchen,MILIFO,
Alpena,Sully's Kitchen,48720 Belcard,IL,34567,,
"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
zingk072,Kevin,Zingerman,K,,,Kevin.Zingerman@pulse.org,
Baker,Day Kitchen,MILIFO,Alpena,Kitchen of the World,
205 Willy B. Temple,WI,50987,,
peizs194,Samantha,Smith,B,20134743,IKP,
Samantha.Smith@pulse.org,"Man, Broiler",Day Kitchen,
MILIFO,Alpena,Kitchen of the World,205 Willy B. Temple,
WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
hutcy231,Yaren,Hutchinson,W,20145712,IKP,
Yaren Hutchinson,@pulse.org,Bottle Washer,Day Kitchen,MILIFO,
Alpena,Kitchen of the World,400 Baker,WI,50987,,
"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
haserz221,Zebediah,Hasermann,L,,IKP,
Zebediah.Haserman@pulse.org,Purchaser,Day Kitchen,MILIFO,
Alpena,Kitchen of the World,400 Baker,WI,50987,555-555-5555,
"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
Re: Comparing two files line by line and exporting the differences from the first file
by Tux (Canon) on Jul 23, 2018 at 12:04 UTC
|
Using more recentish functionality of Text::CSV_XS, you get quite readable code IMHO:
use Text::CSV_XS "csv";
my $hr = csv (
in => "hr.txt",
key => "samaccountname",
keep_headers => \my @keys,
);
my $aoh = csv (in => "ad.txt", bom => 1, on_in => sub {
my $sam = $_[1]{samaccountname} or die "No name in AD";
my $ahr = $hr->{$sam};
unless ($ahr) {
warn "I got AD data for $sam, not in HR\n";
next;
}
my @diff = map { [ $_, $ahr->{$_}, $_[1]{$_} ] }
grep { $ahr->{$_} ne $_[1]{$_} } @keys;
@diff or return;
say "Changes for samaccount $sam";
printf " %-22s %-27.27s -> %s\n", @$_ for @diff;
});
with the two datafiles you provided,
$ perl test.pl
Changes for samaccount barsu991
mail Uttiam.Barski@pulse.org -> William.Barski
+@pulse.org
title Director of Cooks -> Chief of Cooks
Changes for samaccount walkl003
givenname Lreblemet -> Larry
employeenumber 20178941 ->
mail Lreblemet.Walker@pulse.org -> Larry.Walker@p
+ulse.org
title Head Cook -> Cook
Changes for samaccount karss001
givenname Sovyetk -> Steven
mail Sovyetk.Karsten@pulse.org -> Steven.Karsten
+@pulse.org
title Dishwasher -> Dishw
physicaldeliveryoffice Kitchen of the World -> Sully's Kitche
+n
streetaddress 205 Willy B. Temple -> 48720 Belcard
st WI -> IL
postalcode 50987 -> 34567
Changes for samaccount zingk072
givenname Kovon -> Kevin
employeenumber 20113578 ->
symphonyemployeetype IKP ->
mail Kovon.Zingerman@pulse.org -> Kevin.Zingerma
+n@pulse.org
manager cn=manager1,ou=users,ou=Kit ->
Changes for samaccount peizs194
givenname Synthia -> Samantha
sn Smite -> Smith
mail Synthia.Peizer@pulse.org -> Samantha.Smith
+@pulse.org
title Broiler Man -> Man, Broiler
Changes for samaccount hutcy231
givenname Yello -> Yaren
mail Yello Hutchinson -> Yaren Hutchins
+on
Changes for samaccount haserz221
sn Haserkrilk -> Hasermann
employeenumber 20125471 ->
mail Zebediah.Haserkrilk@kit.org -> Zebediah.Haser
+man@pulse.org
telephonenumber -> 555-555-5555
It is up to you to mold that into a report of your liking
Update: If you want to store the changes in a CSV file, change it like this:
my @diff;
my $aoh = csv (in => "ad.txt", bom => 1, on_in => sub {
my $sam = $_[1]{samaccountname} or die "No name in AD";
my $ahr = $hr->{$sam} or die "I got AD data for $sam, no
+t in HR\n";
push @diff, map { [ $sam, $_, $ahr->{$_}, $_[1]{$_} ] }
grep { $ahr->{$_} ne $_[1]{$_} } @keys;
});
csv (in => \@diff, out => "diff.csv");
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
Re: Comparing two files line by line and exporting the differences from the first file
by kcott (Archbishop) on Jul 23, 2018 at 10:58 UTC
|
G'day jzelkowsz,
Here's a solution using Text::CSV
(if you have Text::CSV_XS installed it will run faster)
and in-memory files (see open).
The input data I used is a verbatim copy of what you posted here.
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
my ($hr_file, $ad_file, $com_file) = qw{hr.txt ad.txt com.txt};
my (@col_index, %hr_record_for);
my $csv = Text::CSV::->new({quote_space => 0})
or die "Can't instantiate a Text::CSV object: ",
Text::CSV::->error_diag();
{
open my $mem_fh, '<', canonicalise_file_in_memory($hr_file)
or die "Can't read in-memory file: $!";
@col_index = @{$csv->getline($mem_fh)};
while (my $row = $csv->getline($mem_fh)) {
$hr_record_for{$row->[0]} = $row;
}
}
{
open my $mem_fh, '<', canonicalise_file_in_memory($ad_file)
or die "Can't read in-memory file: $!";
open my $out_fh, '>', $com_file
or die "Can't write '$com_file': $!";
(undef) = $csv->getline($mem_fh);
while (my $row = $csv->getline($mem_fh)) {
for my $i (1 .. $#col_index) {
if ($hr_record_for{$row->[0]}[$i] ne $row->[$i]) {
$csv->say($out_fh, [
$row->[0], $col_index[$i],
$hr_record_for{$row->[0]}[$i]
]);
}
}
}
}
sub canonicalise_file_in_memory {
my ($file) = @_;
open my $fh, '<', $file or die "Can't read '$file': $!";
my $canon;
while (<$fh>) {
chomp if /,$/;
$canon .= $_;
}
return \$canon;
}
Output:
$ cat com.txt
barsu991,mail,Uttiam.Barski@pulse.org
barsu991,title,Director of Cooks
walkl003,givenname,Lreblemet
walkl003,employeenumber,20178941
walkl003,mail,Lreblemet.Walker@pulse.org
walkl003,title,Head Cook
karss001,givenname,Sovyetk
karss001,mail,Sovyetk.Karsten@pulse.org
karss001,title,Dishwasher
karss001,physicaldeliveryoffice,Kitchen of the World
karss001,streetaddress,205 Willy B. Temple
karss001,st,WI
karss001,postalcode,50987
zingk072,givenname,Kovon
zingk072,employeenumber,20113578
zingk072,symphonyemployeetype,IKP
zingk072,mail,Kovon.Zingerman@pulse.org
zingk072,manager,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
hutcy231,givenname,Yello
hutcy231,mail,Yello Hutchinson
haserz221,sn,Haserkrilk
haserz221,employeenumber,20125471
haserz221,mail,Zebediah.Haserkrilk@kit.org
haserz221,telephonenumber,
| [reply] [d/l] [select] |
Re: Comparing two files line by line and exporting the differences from the first file
by AnomalousMonk (Archbishop) on Jul 23, 2018 at 16:06 UTC
|
Other monks have posted solutions based on Text::CSV to which I think you should pay close attention. This post is not about the OPed code per se, but about a general approach to debugging code.
Are you using warnings and strict with your code? I suspect not. If not, do so (see example code below), then fix the problems these thinking-aids reveal. These modules are useful for all Perl programmers, but especially for novice Perlers.
After being sure warnings and strict are enabled, the next thing to do is to be sure you are getting the data you think you're getting.
The statement
($samaccountnameAD,$givennameAD,...,$managerAD)=split(/,$/);
splits a string on a comma that is at the end of the string. The $ in the /,$/ regex is an end-of-string anchor; see perlre, perlretut, and perlrequick. You cannot get more than two fields from this split, but you're trying to get quite a few fields.
c:\@Work\Perl\monks>perl -wMstrict -le
"use warnings;
use strict;
;;
use Data::Dumper;
;;
$_ = 'vv,WWWW,xxx,YY,zzzz,';
;;
my ($v, $w, $x, $y, $z) = split(/,$/);
print Dumper($v, $w, $x, $y, $z);
;;
print qq{'$v' '$w' '$x' '$y' '$z'};
"
$VAR1 = 'vv,WWWW,xxx,YY,zzzz';
$VAR2 = '';
$VAR3 = undef;
$VAR4 = undef;
$VAR5 = undef;
Use of uninitialized value in concatenation (.) or string at -e line 1
+.
Use of uninitialized value in concatenation (.) or string at -e line 1
+.
Use of uninitialized value in concatenation (.) or string at -e line 1
+.
'vv,WWWW,xxx,YY,zzzz' '' '' '' ''
You see in this example the exact output of the split operation; probably not what you wanted and expected. Try this example again with a string that does not end in a comma character; there is a tiny but significant difference. Try it with an empty string as input.
The example above uses Data::Dumper. This utility for visualizing data can be a sanity-saver. It is a core module (i.e., has been made a part of the standard Perl distribution; see corelist for getting info on core modules). I prefer Data::Dump, but it is not core.
This post addresses just one, small aspect of debugging; there are many more. Good luck.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: Comparing two files line by line and exporting the differences from the first file
by Cristoforo (Curate) on Jul 23, 2018 at 02:58 UTC
|
jzelkowsz
To get the data in what I believe is the way it is probably presented, I removed the newline immediately following a comma. That way, the entire employee record is on one line. I did not parse the file using Text::CSV as I probably should've. Your data is as follows:
HR data
samaccountname,givenname,sn,initials,employeenumber,symphonyemployeety
+pe,mail,title,department,company,l,physicaldeliveryoffice,streetaddre
+ss,st,postalcode,telephonenumber,manager
barsu991,Uttiam,Barski,K,20114598,IKP,Uttiam.Barski@pulse.org,Director
+ of Cooks,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker,WI
+,50987,555-555-5555,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=ne
+t"
walkl003,Lreblemet,Walker,J,20178941,IKP,Lreblemet.Walker@pulse.org,He
+ad Cook,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker,WI,5
+0987,555-555-5551,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
karss001,Sovyetk,Karsten,Y,20146598,IKP,Sovyetk.Karsten@pulse.org,Dish
+washer,Day Kitchen,MILIFO,Alpena,Kitchen of the World,205 Willy B. Te
+mple,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
zingk072,Kovon,Zingerman,K,20113578,IKP,Kovon.Zingerman@pulse.org,Bake
+r,Day Kitchen,MILIFO,Alpena,Kitchen of the World,205 Willy B. Temple,
+WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
peizs194,Synthia,Smite,B,20134743,IKP,Synthia.Peizer@pulse.org,Broiler
+ Man,Day Kitchen,MILIFO,Alpena,Kitchen of the World,205 Willy B. Temp
+le,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
hutcy231,Yello,Hutchinson,W,20145712,IKP,Yello Hutchinson,@pulse.org,B
+ottle Washer,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker
+,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
haserz221,Zebediah,Haserkrilk,L,20125471,IKP,Zebediah.Haserkrilk@kit.o
+rg,Purchaser,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker
+,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
AB data
samaccountname,givenname,sn,initials,employeenumber,symphonyemployeety
+pe,mail,title,department,company,l,physicaldeliveryoffice,streetaddre
+ss,st,postalcode,telephonenumber,manager
barsu991,Uttiam,Barski,K,20114598,IKP,William.Barski@pulse.org,Chief o
+f Cooks,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker,WI,5
+0987,555-555-5555,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
walkl003,Larry,Walker,J,,IKP,Larry.Walker@pulse.org,Cook,Day Kitchen,M
+ILIFO,Alpena,Kitchen of the World,400 Baker,WI,50987,555-555-5551,"cn
+=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
karss001,Steven,Karsten,Y,20146598,IKP,Steven.Karsten@pulse.org,Dishw,
+Day Kitchen,MILIFO,Alpena,Sully's Kitchen,48720 Belcard,IL,34567,,"cn
+=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
zingk072,Kevin,Zingerman,K,,,Kevin.Zingerman@pulse.org,Baker,Day Kitch
+en,MILIFO,Alpena,Kitchen of the World,205 Willy B. Temple,WI,50987,,p
+eizs194,Samantha,Smith,B,20134743,IKP,Samantha.Smith@pulse.org,"Man,
+Broiler",Day Kitchen,MILIFO,Alpena,Kitchen of the World,205 Willy B.
+Temple,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
hutcy231,Yaren,Hutchinson,W,20145712,IKP,Yaren Hutchinson,@pulse.org,B
+ottle Washer,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker
+,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
haserz221,Zebediah,Hasermann,L,,IKP,Zebediah.Haserman@pulse.org,Purcha
+ser,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker,WI,50987
+,555-555-5555,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"
This solution assumes 'samaccountname' is correct between the 2 files and checks that the headers are the same for each file. Also I noticed there were some entries in the AD file that weren't in the HR file. I didn't try to compare since they weren't in the same file. I didn't know how you would handle this situation.
#!/usr/bin/perl
use strict;
use warnings;
my $hr_file = 'HR.txt';
open my $fh, '<', $hr_file or die $!;
my (undef, @hdr_hr) = split /,/, <$fh>;
chomp @hdr_hr;
my %hr_data;
while (<$fh>) {
chomp;
my ($id, @rest) = split /,/;
@{ $hr_data{$id} }{@hdr_hr} = @rest;
}
close $fh or die $!;
my $ad_file = 'AD.txt';
open $fh, '<', $ad_file or die $!;
my (undef, @hdr_ad) = split /,/, <$fh>;
chomp @hdr_ad;
@hdr_ad ~~ @hdr_hr or die "Uncompatible headers between HR and AD file
+s\n";
my %ad_data;
while (<$fh>) {
chomp;
my ($id, @rest) = split /,/;
@{ $ad_data{$id} }{@hdr_ad} = @rest;
}
close $fh or die $!;
for my $id (sort keys %hr_data) {
next unless exists $ad_data{$id};
for my $hdr (@hdr_hr) {
my $description_hr = $hr_data{$id}{$hdr};
my $description_ad = $ad_data{$id}{$hdr};
print "$id,$hdr,$description_hr\n"
unless $description_hr eq $description_ad;
}
}
Output I got is:
barsu991,mail,Uttiam.Barski@pulse.org
barsu991,title,Director of Cooks
haserz221,sn,Haserkrilk
haserz221,employeenumber,20125471
haserz221,mail,Zebediah.Haserkrilk@kit.org
haserz221,telephonenumber,
hutcy231,givenname,Yello
hutcy231,mail,Yello Hutchinson
karss001,givenname,Sovyetk
karss001,mail,Sovyetk.Karsten@pulse.org
karss001,title,Dishwasher
karss001,physicaldeliveryoffice,Kitchen of the World
karss001,streetaddress,205 Willy B. Temple
karss001,st,WI
karss001,postalcode,50987
walkl003,givenname,Lreblemet
walkl003,employeenumber,20178941
walkl003,mail,Lreblemet.Walker@pulse.org
walkl003,title,Head Cook
zingk072,givenname,Kovon
zingk072,employeenumber,20113578
zingk072,symphonyemployeetype,IKP
zingk072,mail,Kovon.Zingerman@pulse.org
zingk072,manager,"cn=manager1
| [reply] [d/l] [select] |
|
| [reply] [d/l] [select] |
|
|