File Differences

TStanley has asked for the wisdom of the Perl Monks concerning the following question:

Ok monks, I have a little bit of a dilemma here. I was asked to write a short script to list the differences of files inside each directory.
To provide some background, we have a production environment and a development environment for our payroll application, and we want to see what source files are different between the two. There are four directories in the development environment, and each one has a corresponding directory in the production environment.
To make a long story short, my hacked together script became very popular and they now need me to add to it. My problem is that of trying to maintain the thing, as it is now huge, and in dire need of some help, so that I can easily modify it. All I need is pointers in the right direction. Thanks.

#!/opt/perl5/bin/perl -w
use strict;
use File::Slurp;

# The production files will be listed in
# the source hashes
my (%iqs,%iqs_source,%prg,%prg_source);
my (%spg,%spg_source,%tpr,%tpr_source);
my $report="/home/mis/tstanley/SourceDiff.rpt";

## Read each directory and get the file names
my @iqs_src=read_dir('/dsmpayroll/iqs-source');
my @iqs=read_dir('/dsmmigrate/development/iqs');

my @prg_src=read_dir('/dsmpayroll/prg-source');
my @prg=read_dir('/dsmmigrate/development/prg');

my @spg_src=read_dir('/dsmpayroll/spg-source');
my @spg=read_dir('/dsmmigrate/development/spg');

my @tpr_src=read_dir('/dsmpayroll/tpr-source');
my @tpr=read_dir('/dsmmigrate/development/tpr');

## Change into each directory and go through the files
## and get the sum of each file and put into a hash

chdir('/dsmpayroll/iqs-source');
foreach(@iqs_src){
  my $sum=`sum $_`;
  my ($x,$y,$z)=split /\s+/,$sum;
  $iqs_source{$_}=$x;
}

chdir('/dsmpayroll/prg-source');
foreach(@prg_src){
  my $sum=`sum $_`;
  my ($x,$y,$z)=split /\s+/,$sum;
  $prg_source{$_}=$x;
}

chdir('/dsmpayroll/spg-source');
foreach(@spg_src){
  my $sum=`sum $_`;
  my ($x,$y,$z)=split /\s+/,$sum;
  $spg_source{$_}=$x;
}

chdir('/dsmpayroll/tpr-source');
foreach(@tpr_src){
  my $sum=`sum $_`;
  my ($x,$y,$z)=split /\s+/,$sum;
  $tpr_source{$_}=$x;
}

chdir('/dsmmigrate/development/iqs');
foreach(@iqs){
  my $sum=`sum $_`;
  my ($x,$y,$z)=split /\s+/,$sum;
  $iqs{$_}=$x;
}

chdir('/dsmmigrate/development/prg');
foreach(@prg){
  my $sum=`sum $_`;
  my ($x,$y,$z)=split /\s+/,$sum;
  $prg{$_}=$x;
}

chdir('/dsmmigrate/development/spg');
foreach(@spg){
  my $sum=`sum $_`;
  my ($x,$y,$z)=split /\s+/,$sum;
  $spg{$_}=$x;
}

chdir('/dsmmigrate/development/tpr');
foreach(@tpr){
  my $sum=`sum $_`;
  my ($x,$y,$z)=split /\s+/,$sum;
  $tpr{$_}=$x;
}

## Open the report file
open RPT,">$report"||die "Unable to open $report: $!\n";
my $date=`date`;
print RPT "$date\n\n";

## For each set of directories, we find the common file names in them,
+ and
## then go through those files and compare the sums. If the Production
+ file
## is different, we write it out to the report file.

print RPT "IQS Source Files\n";
print RPT "================\n";
my @iqs_common;
foreach(keys %iqs){
  next if $_=~/cob/; ## This is an empty directory that exists in each
+ source directory
  push(@iqs_common,$_) if exists $iqs_source{$_};
}

foreach(@iqs_common){
  next if $_=/\.ffl|\.lst/; ## If a file has these extensions, they ca
+n be ignored
  my $prod=$iqs_source{$_};
  my $dev=$iqs{$_};
  next if !defined $prod and !defined $dev; ## Error handling if all f
+iles match or can be ignored
  if ($prod != $dev){
    print RPT "/dsmpayroll/iqs-source/$_\n";
  }
}

print RPT "\n\nPRG Source Files\n";
print RPT "================\n";
my @prg_common;
foreach(keys %prg){
  next if $_=~/cob/; ## This is an empty directory that exists in each
+ source directory
  push(@prg_common,$_) if exists $prg_source{$_};
}

foreach(@prg_common){
  next if $_=~/\.ffl|\.lst/; ## If a file has these extensions, they c
+an be ignored
  my $prod=$prg_source{$_};
  my $dev=$prg{$_};
  next if !defined $prod and !defined $dev; ## Error handling if all f
+iles match or can be ignored
  if($prod != $dev){
    print RPT "/dsmpayroll/prg-source/$_\n";
  }
}

print RPT "\n\nSPG Source Files\n";
print RPT "================\n";
my @spg_common;
foreach(keys %spg){
  next if $_=~/cob/; ## This is an empty directory that exists in each
+ source directory
  push(@spg_common,$_) if exists $spg_source{$_};
}

foreach(@spg_common){
  next if $_=~/\.ffl|\.lst/; ## If a file has these extensions, they c
+an be ignored
  my $prod=$spg_source{$_};
  my $dev=$spg{$_};
  next if !defined $prod and !defined $dev; ## Error handling if all f
+iles match or can be ignored
  if($prod != $dev){
    print RPT "/dsmpayroll/spg-source/$_\n";
  }
}

print RPT "\n\nTPR Source Files\n";
print RPT "================\n";
my @tpr_common;
foreach(keys %tpr){
  next if $_=~/cob/; ## This is an empty directory that exists in each
+ source directory
  push(@tpr_common,$_) if exists $tpr_source{$_};
}

foreach(@tpr_common){
  next if $_=~/\.ffl|\.lst/; ## If a file has these extensions, they c
+an be ignored
  my $prod=$tpr_source{$_};
  my $dev=$tpr{$_};
  next if !defined $prod and !defined $dev; ## Error handling if all f
+iles match or can be ignored
  if($prod != $dev){
    print RPT "/dsmpayroll/tpr-source/$_\n";
  }
}

close RPT;
[download]

TStanley
--------
The only thing necessary for the triumph of evil is for good men to do nothing -- Edmund Burke

Comment on File Differences Download Code

Replies are listed 'Best First'.
Re: File Differences by ikegami (Patriarch) on May 31, 2005 at 16:39 UTC
Instead of having a hash for each directory, you could have a hash of hashes: #!/opt/perl5/bin/perl -w use strict; use File::Slurp; # The production files will be listed in # the source hashes my $report = "/home/mis/tstanley/SourceDiff.rpt"; my %input = ( iqs_source => '/dsmpayroll/iqs-source', prg_source => '/dsmpayroll/prg-source', spg_source => '/dsmpayroll/spg-source', tpr_source => '/dsmpayroll/tpr-source', iqs => '/dsmmigrate/development/iqs', prg => '/dsmmigrate/development/prg', spg => '/dsmmigrate/development/spg', tpr => '/dsmmigrate/development/tpr', ); my %data; foreach my $key (keys(%input)) { my $dir = $input{$key}; chdir($dir); foreach my $file (read_dir('.')) { my $sum = `sum $file`; my ($x) = split(/\s+/, $sum); $data{$key}{$file} = $x; } } ## Open the report file ... [download] Update: It seems there's no need to build a hash of hashes, since we only need to look at a pair of directories at a time. I elminated the pesky `chdir` commands while I was at it. Here it is a version with (just about) all redundancy eliminated, but without any change in functionality: #!/opt/perl5/bin/perl -w use strict; # The production files will be listed in # the source hashes my $report = "/home/mis/tstanley/SourceDiff.rpt"; sub DEV () { 0 } sub PROD () { 1 } my %input = ( IQS => [ '/dsmmigrate/development/iqs', '/dsmpayroll/iqs-source' ], PRG => [ '/dsmmigrate/development/prg', '/dsmpayroll/prg-source' ], SPG => [ '/dsmmigrate/development/spg', '/dsmpayroll/spg-source' ], TPR => [ '/dsmmigrate/development/tpr', '/dsmpayroll/tpr-source' ], ); ## Open the report file open RPT, ">$report" or die "Unable to open $report: $!\n"; my $date=`date`; print RPT "$date\n\n"; ## For each set of directories, we find the common file names in them, ## and then go through those files and compare the sums. If the ## Production file is different, we write it out to the report file. my $print_seperator; # False for first loop iteration. foreach my $key (sort keys %input) { local *DH; my $dir; my $file; my %dev_crcs; my %prod_crcs; $dir = $input{$key}[DEV]; opendir(DH, $dir) or die "Unable to open dir $dir: $!\n"; while (defined($file = readline(DIR))) { next if $_ eq '.'; next if $_ eq '..'; next if $_ eq 'cob'; next if substr($_, -4) eq '.ffl'; next if substr($_, -4) eq '.lst'; $dev_crcs{$file} = 0+`sum $dir/$file`; } $dir = $input{$key}[PROD]; opendir(DH, $dir) or die "Unable to open dir $dir: $!\n"; while (defined($file = readline(DIR))) { next if $_ eq '.'; next if $_ eq '..'; next if substr($_, -4) eq '.ffl'; next if substr($_, -4) eq '.lst'; $prod_crcs{$file} = 0+`sum $dir/$file`; } if ($print_seperator) { print RPT "\n\n"; } else { # True for all but first loop iteration. $print_seperator = 1; } print RPT "$key Source Files\n"; print RPT ("=" x length($key)) . "=============\n"; foreach $file (keys(%dev_crcs)) { if (exists($prod_crcs{$file}) && $prod_crcs{$file} != $dev_crcs{$file}) { print RPT "$dir/$file\n" } } } [download] Keep in mind that files in one directory but not in the other are not printed, just like in your solution.	[reply] [d/l] [select]
Re: File Differences by tlm (Prior) on May 31, 2005 at 16:43 UTC
This doesn't directly answer your question, but judging from your shebang line, I gather that this is a Unixoid system. The GNU utility `diff` (and maybe other `diff`s as well) can be used to compare directories. E.g.: `% diff --recursive dir1 dir2` [download] the lowliest monk	[reply] [d/l]
Re^2: File Differences by TStanley (Canon) on May 31, 2005 at 17:07 UTC
The person who asked for my help, had tried this. Some of the files are over 1000 lines in size, and trying to wade through the diff files was nearly impossible. TStanley -------- The only thing necessary for the triumph of evil is for good men to do nothing -- Edmund Burke	[reply]
Re^3: File Differences by lupey (Monk) on May 31, 2005 at 18:43 UTC
I wouldn't give up on `diff` though. You can use the `-q` switch which only outputs whether the files differ. By using some of the other `diff` switches (like -B which ignores changes whose lines are blank), you might be able to reduce the number of files that really differ. lupey	[reply] [d/l] [select]
Re: File Differences by davido (Cardinal) on May 31, 2005 at 16:35 UTC
Well, for one thing, you could factor all those `chdir...` segments down to their common denominator and call them as a subroutine within a foreach loop instead of repeatedly typing the same code again and again. Other than that, it would help if we knew in what way you wished to change the code. Your question doesn't ask for anything specific. Dave	[reply] [d/l]
Re: File Differences by TedPride (Priest) on May 31, 2005 at 17:26 UTC
Looks like you need something like the following: `use strict; use warnings; my @compare = ( ['/dir1/contents/','/dir2/contents','^cob$'], ['/dir1/contents/','/dir2/contents','\.ffl$','\.1st$'] ); my ($dir1, $dir2, @r, $regex); for (@compare) { ($dir1, $dir2, @r) = @$_; # Load dir info, create hash, compare hash # If hash doesn't match... $regex = join '\|', @r; print "$regex\n"; # Sort and run through files, ignoring ones that regex matches if +regex exists. # Print directory and file info for files that don't match between + directories }` [download] The duplication in your code is in having to run the same pieces of code on many different directories and file names / endings. Create a structure instead that lets you easily store this info and loop through it, and your code can be reduced to a fraction of its previous size.	[reply] [d/l]
Re: File Differences by TStanley (Canon) on Jun 07, 2005 at 16:57 UTC
I finally was able to sit down last night at home and do a re-write of the script. I present to you, a much cleaner and shorter version: Read more... (6 kB) TStanley -------- The only thing necessary for the triumph of evil is for good men to do nothing -- Edmund Burke	[reply] [d/l]


go ahead... be a heretic
	PerlMonks