Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
jzelkowsz

To get the data in what I believe is the way it is probably presented, I removed the newline immediately following a comma. That way, the entire employee record is on one line. I did not parse the file using Text::CSV as I probably should've. Your data is as follows:

HR data

samaccountname,givenname,sn,initials,employeenumber,symphonyemployeety +pe,mail,title,department,company,l,physicaldeliveryoffice,streetaddre +ss,st,postalcode,telephonenumber,manager barsu991,Uttiam,Barski,K,20114598,IKP,Uttiam.Barski@pulse.org,Director + of Cooks,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker,WI +,50987,555-555-5555,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=ne +t" walkl003,Lreblemet,Walker,J,20178941,IKP,Lreblemet.Walker@pulse.org,He +ad Cook,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker,WI,5 +0987,555-555-5551,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" karss001,Sovyetk,Karsten,Y,20146598,IKP,Sovyetk.Karsten@pulse.org,Dish +washer,Day Kitchen,MILIFO,Alpena,Kitchen of the World,205 Willy B. Te +mple,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" zingk072,Kovon,Zingerman,K,20113578,IKP,Kovon.Zingerman@pulse.org,Bake +r,Day Kitchen,MILIFO,Alpena,Kitchen of the World,205 Willy B. Temple, +WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" peizs194,Synthia,Smite,B,20134743,IKP,Synthia.Peizer@pulse.org,Broiler + Man,Day Kitchen,MILIFO,Alpena,Kitchen of the World,205 Willy B. Temp +le,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" hutcy231,Yello,Hutchinson,W,20145712,IKP,Yello Hutchinson,@pulse.org,B +ottle Washer,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker +,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" haserz221,Zebediah,Haserkrilk,L,20125471,IKP,Zebediah.Haserkrilk@kit.o +rg,Purchaser,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker +,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"

AB data

samaccountname,givenname,sn,initials,employeenumber,symphonyemployeety +pe,mail,title,department,company,l,physicaldeliveryoffice,streetaddre +ss,st,postalcode,telephonenumber,manager barsu991,Uttiam,Barski,K,20114598,IKP,William.Barski@pulse.org,Chief o +f Cooks,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker,WI,5 +0987,555-555-5555,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" walkl003,Larry,Walker,J,,IKP,Larry.Walker@pulse.org,Cook,Day Kitchen,M +ILIFO,Alpena,Kitchen of the World,400 Baker,WI,50987,555-555-5551,"cn +=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" karss001,Steven,Karsten,Y,20146598,IKP,Steven.Karsten@pulse.org,Dishw, +Day Kitchen,MILIFO,Alpena,Sully's Kitchen,48720 Belcard,IL,34567,,"cn +=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" zingk072,Kevin,Zingerman,K,,,Kevin.Zingerman@pulse.org,Baker,Day Kitch +en,MILIFO,Alpena,Kitchen of the World,205 Willy B. Temple,WI,50987,,p +eizs194,Samantha,Smith,B,20134743,IKP,Samantha.Smith@pulse.org,"Man, +Broiler",Day Kitchen,MILIFO,Alpena,Kitchen of the World,205 Willy B. +Temple,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" hutcy231,Yaren,Hutchinson,W,20145712,IKP,Yaren Hutchinson,@pulse.org,B +ottle Washer,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker +,WI,50987,,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net" haserz221,Zebediah,Hasermann,L,,IKP,Zebediah.Haserman@pulse.org,Purcha +ser,Day Kitchen,MILIFO,Alpena,Kitchen of the World,400 Baker,WI,50987 +,555-555-5555,"cn=manager1,ou=users,ou=Kitchen,dc=Kitchen,dc=net"

This solution assumes 'samaccountname' is correct between the 2 files and checks that the headers are the same for each file. Also I noticed there were some entries in the AD file that weren't in the HR file. I didn't try to compare since they weren't in the same file. I didn't know how you would handle this situation.

#!/usr/bin/perl use strict; use warnings; my $hr_file = 'HR.txt'; open my $fh, '<', $hr_file or die $!; my (undef, @hdr_hr) = split /,/, <$fh>; chomp @hdr_hr; my %hr_data; while (<$fh>) { chomp; my ($id, @rest) = split /,/; @{ $hr_data{$id} }{@hdr_hr} = @rest; } close $fh or die $!; my $ad_file = 'AD.txt'; open $fh, '<', $ad_file or die $!; my (undef, @hdr_ad) = split /,/, <$fh>; chomp @hdr_ad; @hdr_ad ~~ @hdr_hr or die "Uncompatible headers between HR and AD file +s\n"; my %ad_data; while (<$fh>) { chomp; my ($id, @rest) = split /,/; @{ $ad_data{$id} }{@hdr_ad} = @rest; } close $fh or die $!; for my $id (sort keys %hr_data) { next unless exists $ad_data{$id}; for my $hdr (@hdr_hr) { my $description_hr = $hr_data{$id}{$hdr}; my $description_ad = $ad_data{$id}{$hdr}; print "$id,$hdr,$description_hr\n" unless $description_hr eq $description_ad; } }

Output I got is:

barsu991,mail,Uttiam.Barski@pulse.org barsu991,title,Director of Cooks haserz221,sn,Haserkrilk haserz221,employeenumber,20125471 haserz221,mail,Zebediah.Haserkrilk@kit.org haserz221,telephonenumber, hutcy231,givenname,Yello hutcy231,mail,Yello Hutchinson karss001,givenname,Sovyetk karss001,mail,Sovyetk.Karsten@pulse.org karss001,title,Dishwasher karss001,physicaldeliveryoffice,Kitchen of the World karss001,streetaddress,205 Willy B. Temple karss001,st,WI karss001,postalcode,50987 walkl003,givenname,Lreblemet walkl003,employeenumber,20178941 walkl003,mail,Lreblemet.Walker@pulse.org walkl003,title,Head Cook zingk072,givenname,Kovon zingk072,employeenumber,20113578 zingk072,symphonyemployeetype,IKP zingk072,mail,Kovon.Zingerman@pulse.org zingk072,manager,"cn=manager1

In reply to Re: Comparing two files line by line and exporting the differences from the first file by Cristoforo
in thread Comparing two files line by line and exporting the differences from the first file by jzelkowsz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-03-28 23:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found