TStanley has asked for the wisdom of the Perl Monks concerning the following question:
Ok monks, I have a little bit of a dilemma here. I was asked to write a short script
to list the differences of files inside each directory.
To provide some background, we have a production environment and a development environment for our payroll application, and we want to see what source files are different between the two. There are four directories in the development environment, and each one has a corresponding directory in the production environment.
To make a long story short, my hacked together script became very popular and they now need me to add to it. My problem is that of trying to maintain the thing, as it is now huge, and in dire need of some help, so that I can easily modify it. All I need is pointers in the right direction. Thanks.
#!/opt/perl5/bin/perl -w
use strict;
use File::Slurp;
# The production files will be listed in
# the source hashes
my (%iqs,%iqs_source,%prg,%prg_source);
my (%spg,%spg_source,%tpr,%tpr_source);
my $report="/home/mis/tstanley/SourceDiff.rpt";
## Read each directory and get the file names
my @iqs_src=read_dir('/dsmpayroll/iqs-source');
my @iqs=read_dir('/dsmmigrate/development/iqs');
my @prg_src=read_dir('/dsmpayroll/prg-source');
my @prg=read_dir('/dsmmigrate/development/prg');
my @spg_src=read_dir('/dsmpayroll/spg-source');
my @spg=read_dir('/dsmmigrate/development/spg');
my @tpr_src=read_dir('/dsmpayroll/tpr-source');
my @tpr=read_dir('/dsmmigrate/development/tpr');
## Change into each directory and go through the files
## and get the sum of each file and put into a hash
chdir('/dsmpayroll/iqs-source');
foreach(@iqs_src){
my $sum=`sum $_`;
my ($x,$y,$z)=split /\s+/,$sum;
$iqs_source{$_}=$x;
}
chdir('/dsmpayroll/prg-source');
foreach(@prg_src){
my $sum=`sum $_`;
my ($x,$y,$z)=split /\s+/,$sum;
$prg_source{$_}=$x;
}
chdir('/dsmpayroll/spg-source');
foreach(@spg_src){
my $sum=`sum $_`;
my ($x,$y,$z)=split /\s+/,$sum;
$spg_source{$_}=$x;
}
chdir('/dsmpayroll/tpr-source');
foreach(@tpr_src){
my $sum=`sum $_`;
my ($x,$y,$z)=split /\s+/,$sum;
$tpr_source{$_}=$x;
}
chdir('/dsmmigrate/development/iqs');
foreach(@iqs){
my $sum=`sum $_`;
my ($x,$y,$z)=split /\s+/,$sum;
$iqs{$_}=$x;
}
chdir('/dsmmigrate/development/prg');
foreach(@prg){
my $sum=`sum $_`;
my ($x,$y,$z)=split /\s+/,$sum;
$prg{$_}=$x;
}
chdir('/dsmmigrate/development/spg');
foreach(@spg){
my $sum=`sum $_`;
my ($x,$y,$z)=split /\s+/,$sum;
$spg{$_}=$x;
}
chdir('/dsmmigrate/development/tpr');
foreach(@tpr){
my $sum=`sum $_`;
my ($x,$y,$z)=split /\s+/,$sum;
$tpr{$_}=$x;
}
## Open the report file
open RPT,">$report"||die "Unable to open $report: $!\n";
my $date=`date`;
print RPT "$date\n\n";
## For each set of directories, we find the common file names in them,
+ and
## then go through those files and compare the sums. If the Production
+ file
## is different, we write it out to the report file.
print RPT "IQS Source Files\n";
print RPT "================\n";
my @iqs_common;
foreach(keys %iqs){
next if $_=~/cob/; ## This is an empty directory that exists in each
+ source directory
push(@iqs_common,$_) if exists $iqs_source{$_};
}
foreach(@iqs_common){
next if $_=/\.ffl|\.lst/; ## If a file has these extensions, they ca
+n be ignored
my $prod=$iqs_source{$_};
my $dev=$iqs{$_};
next if !defined $prod and !defined $dev; ## Error handling if all f
+iles match or can be ignored
if ($prod != $dev){
print RPT "/dsmpayroll/iqs-source/$_\n";
}
}
print RPT "\n\nPRG Source Files\n";
print RPT "================\n";
my @prg_common;
foreach(keys %prg){
next if $_=~/cob/; ## This is an empty directory that exists in each
+ source directory
push(@prg_common,$_) if exists $prg_source{$_};
}
foreach(@prg_common){
next if $_=~/\.ffl|\.lst/; ## If a file has these extensions, they c
+an be ignored
my $prod=$prg_source{$_};
my $dev=$prg{$_};
next if !defined $prod and !defined $dev; ## Error handling if all f
+iles match or can be ignored
if($prod != $dev){
print RPT "/dsmpayroll/prg-source/$_\n";
}
}
print RPT "\n\nSPG Source Files\n";
print RPT "================\n";
my @spg_common;
foreach(keys %spg){
next if $_=~/cob/; ## This is an empty directory that exists in each
+ source directory
push(@spg_common,$_) if exists $spg_source{$_};
}
foreach(@spg_common){
next if $_=~/\.ffl|\.lst/; ## If a file has these extensions, they c
+an be ignored
my $prod=$spg_source{$_};
my $dev=$spg{$_};
next if !defined $prod and !defined $dev; ## Error handling if all f
+iles match or can be ignored
if($prod != $dev){
print RPT "/dsmpayroll/spg-source/$_\n";
}
}
print RPT "\n\nTPR Source Files\n";
print RPT "================\n";
my @tpr_common;
foreach(keys %tpr){
next if $_=~/cob/; ## This is an empty directory that exists in each
+ source directory
push(@tpr_common,$_) if exists $tpr_source{$_};
}
foreach(@tpr_common){
next if $_=~/\.ffl|\.lst/; ## If a file has these extensions, they c
+an be ignored
my $prod=$tpr_source{$_};
my $dev=$tpr{$_};
next if !defined $prod and !defined $dev; ## Error handling if all f
+iles match or can be ignored
if($prod != $dev){
print RPT "/dsmpayroll/tpr-source/$_\n";
}
}
close RPT;
TStanley
--------
The only thing necessary for the triumph of evil is for good men to do nothing -- Edmund Burke
Re: File Differences
by ikegami (Patriarch) on May 31, 2005 at 16:39 UTC
|
Instead of having a hash for each directory, you could have a hash of hashes:
#!/opt/perl5/bin/perl -w
use strict;
use File::Slurp;
# The production files will be listed in
# the source hashes
my $report = "/home/mis/tstanley/SourceDiff.rpt";
my %input = (
iqs_source => '/dsmpayroll/iqs-source',
prg_source => '/dsmpayroll/prg-source',
spg_source => '/dsmpayroll/spg-source',
tpr_source => '/dsmpayroll/tpr-source',
iqs => '/dsmmigrate/development/iqs',
prg => '/dsmmigrate/development/prg',
spg => '/dsmmigrate/development/spg',
tpr => '/dsmmigrate/development/tpr',
);
my %data;
foreach my $key (keys(%input)) {
my $dir = $input{$key};
chdir($dir);
foreach my $file (read_dir('.')) {
my $sum = `sum $file`;
my ($x) = split(/\s+/, $sum);
$data{$key}{$file} = $x;
}
}
## Open the report file
...
Update: It seems there's no need to build a hash of hashes, since we only need to look at a pair of directories at a time. I elminated the pesky chdir commands while I was at it. Here it is a version with (just about) all redundancy eliminated, but without any change in functionality:
#!/opt/perl5/bin/perl -w
use strict;
# The production files will be listed in
# the source hashes
my $report = "/home/mis/tstanley/SourceDiff.rpt";
sub DEV () { 0 }
sub PROD () { 1 }
my %input = (
IQS => [ '/dsmmigrate/development/iqs', '/dsmpayroll/iqs-source' ],
PRG => [ '/dsmmigrate/development/prg', '/dsmpayroll/prg-source' ],
SPG => [ '/dsmmigrate/development/spg', '/dsmpayroll/spg-source' ],
TPR => [ '/dsmmigrate/development/tpr', '/dsmpayroll/tpr-source' ],
);
## Open the report file
open RPT, ">$report"
or die "Unable to open $report: $!\n";
my $date=`date`;
print RPT "$date\n\n";
## For each set of directories, we find the common file names in them,
## and then go through those files and compare the sums. If the
## Production file is different, we write it out to the report file.
my $print_seperator; # False for first loop iteration.
foreach my $key (sort keys %input) {
local *DH;
my $dir;
my $file;
my %dev_crcs;
my %prod_crcs;
$dir = $input{$key}[DEV];
opendir(DH, $dir)
or die "Unable to open dir $dir: $!\n";
while (defined($file = readline(DIR))) {
next if $_ eq '.';
next if $_ eq '..';
next if $_ eq 'cob';
next if substr($_, -4) eq '.ffl';
next if substr($_, -4) eq '.lst';
$dev_crcs{$file} = 0+`sum $dir/$file`;
}
$dir = $input{$key}[PROD];
opendir(DH, $dir)
or die "Unable to open dir $dir: $!\n";
while (defined($file = readline(DIR))) {
next if $_ eq '.';
next if $_ eq '..';
next if substr($_, -4) eq '.ffl';
next if substr($_, -4) eq '.lst';
$prod_crcs{$file} = 0+`sum $dir/$file`;
}
if ($print_seperator) {
print RPT "\n\n";
} else {
# True for all but first loop iteration.
$print_seperator = 1;
}
print RPT "$key Source Files\n";
print RPT ("=" x length($key)) . "=============\n";
foreach $file (keys(%dev_crcs)) {
if (exists($prod_crcs{$file})
&& $prod_crcs{$file} != $dev_crcs{$file}) {
print RPT "$dir/$file\n"
}
}
}
Keep in mind that files in one directory but not in the other are not printed, just like in your solution.
| [reply] [d/l] [select] |
Re: File Differences
by tlm (Prior) on May 31, 2005 at 16:43 UTC
|
This doesn't directly answer your question, but judging from your shebang line, I gather that this is a Unixoid system. The GNU utility diff (and maybe other diffs as well) can be used to compare directories. E.g.:
% diff --recursive dir1 dir2
| [reply] [d/l] |
|
| [reply] |
|
I wouldn't give up on diff though. You can use the -q switch which only outputs whether the files differ. By using some of the other diff switches (like -B which ignores changes whose lines are blank), you might be able to reduce the number of files that really differ.
lupey
| [reply] [d/l] [select] |
Re: File Differences
by davido (Cardinal) on May 31, 2005 at 16:35 UTC
|
Well, for one thing, you could factor all those chdir... segments down to their common denominator and call them as a subroutine within a foreach loop instead of repeatedly typing the same code again and again.
Other than that, it would help if we knew in what way you wished to change the code. Your question doesn't ask for anything specific.
| [reply] [d/l] |
Re: File Differences
by TedPride (Priest) on May 31, 2005 at 17:26 UTC
|
Looks like you need something like the following:
use strict;
use warnings;
my @compare = (
['/dir1/contents/','/dir2/contents','^cob$'],
['/dir1/contents/','/dir2/contents','\.ffl$','\.1st$']
);
my ($dir1, $dir2, @r, $regex);
for (@compare) {
($dir1, $dir2, @r) = @$_;
# Load dir info, create hash, compare hash
# If hash doesn't match...
$regex = join '|', @r; print "$regex\n";
# Sort and run through files, ignoring ones that regex matches if
+regex exists.
# Print directory and file info for files that don't match between
+ directories
}
The duplication in your code is in having to run the same pieces of code on many different directories and file names / endings. Create a structure instead that lets you easily store this info and loop through it, and your code can be reduced to a fraction of its previous size. | [reply] [d/l] |
Re: File Differences
by TStanley (Canon) on Jun 07, 2005 at 16:57 UTC
|
| [reply] [d/l] |
|
|