Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

File Differences

by TStanley (Canon)
on May 31, 2005 at 16:30 UTC ( [id://462110]=perlquestion: print w/replies, xml ) Need Help??

TStanley has asked for the wisdom of the Perl Monks concerning the following question:

Ok monks, I have a little bit of a dilemma here. I was asked to write a short script to list the differences of files inside each directory.
To provide some background, we have a production environment and a development environment for our payroll application, and we want to see what source files are different between the two. There are four directories in the development environment, and each one has a corresponding directory in the production environment.
To make a long story short, my hacked together script became very popular and they now need me to add to it. My problem is that of trying to maintain the thing, as it is now huge, and in dire need of some help, so that I can easily modify it. All I need is pointers in the right direction. Thanks.
#!/opt/perl5/bin/perl -w use strict; use File::Slurp; # The production files will be listed in # the source hashes my (%iqs,%iqs_source,%prg,%prg_source); my (%spg,%spg_source,%tpr,%tpr_source); my $report="/home/mis/tstanley/SourceDiff.rpt"; ## Read each directory and get the file names my @iqs_src=read_dir('/dsmpayroll/iqs-source'); my @iqs=read_dir('/dsmmigrate/development/iqs'); my @prg_src=read_dir('/dsmpayroll/prg-source'); my @prg=read_dir('/dsmmigrate/development/prg'); my @spg_src=read_dir('/dsmpayroll/spg-source'); my @spg=read_dir('/dsmmigrate/development/spg'); my @tpr_src=read_dir('/dsmpayroll/tpr-source'); my @tpr=read_dir('/dsmmigrate/development/tpr'); ## Change into each directory and go through the files ## and get the sum of each file and put into a hash chdir('/dsmpayroll/iqs-source'); foreach(@iqs_src){ my $sum=`sum $_`; my ($x,$y,$z)=split /\s+/,$sum; $iqs_source{$_}=$x; } chdir('/dsmpayroll/prg-source'); foreach(@prg_src){ my $sum=`sum $_`; my ($x,$y,$z)=split /\s+/,$sum; $prg_source{$_}=$x; } chdir('/dsmpayroll/spg-source'); foreach(@spg_src){ my $sum=`sum $_`; my ($x,$y,$z)=split /\s+/,$sum; $spg_source{$_}=$x; } chdir('/dsmpayroll/tpr-source'); foreach(@tpr_src){ my $sum=`sum $_`; my ($x,$y,$z)=split /\s+/,$sum; $tpr_source{$_}=$x; } chdir('/dsmmigrate/development/iqs'); foreach(@iqs){ my $sum=`sum $_`; my ($x,$y,$z)=split /\s+/,$sum; $iqs{$_}=$x; } chdir('/dsmmigrate/development/prg'); foreach(@prg){ my $sum=`sum $_`; my ($x,$y,$z)=split /\s+/,$sum; $prg{$_}=$x; } chdir('/dsmmigrate/development/spg'); foreach(@spg){ my $sum=`sum $_`; my ($x,$y,$z)=split /\s+/,$sum; $spg{$_}=$x; } chdir('/dsmmigrate/development/tpr'); foreach(@tpr){ my $sum=`sum $_`; my ($x,$y,$z)=split /\s+/,$sum; $tpr{$_}=$x; } ## Open the report file open RPT,">$report"||die "Unable to open $report: $!\n"; my $date=`date`; print RPT "$date\n\n"; ## For each set of directories, we find the common file names in them, + and ## then go through those files and compare the sums. If the Production + file ## is different, we write it out to the report file. print RPT "IQS Source Files\n"; print RPT "================\n"; my @iqs_common; foreach(keys %iqs){ next if $_=~/cob/; ## This is an empty directory that exists in each + source directory push(@iqs_common,$_) if exists $iqs_source{$_}; } foreach(@iqs_common){ next if $_=/\.ffl|\.lst/; ## If a file has these extensions, they ca +n be ignored my $prod=$iqs_source{$_}; my $dev=$iqs{$_}; next if !defined $prod and !defined $dev; ## Error handling if all f +iles match or can be ignored if ($prod != $dev){ print RPT "/dsmpayroll/iqs-source/$_\n"; } } print RPT "\n\nPRG Source Files\n"; print RPT "================\n"; my @prg_common; foreach(keys %prg){ next if $_=~/cob/; ## This is an empty directory that exists in each + source directory push(@prg_common,$_) if exists $prg_source{$_}; } foreach(@prg_common){ next if $_=~/\.ffl|\.lst/; ## If a file has these extensions, they c +an be ignored my $prod=$prg_source{$_}; my $dev=$prg{$_}; next if !defined $prod and !defined $dev; ## Error handling if all f +iles match or can be ignored if($prod != $dev){ print RPT "/dsmpayroll/prg-source/$_\n"; } } print RPT "\n\nSPG Source Files\n"; print RPT "================\n"; my @spg_common; foreach(keys %spg){ next if $_=~/cob/; ## This is an empty directory that exists in each + source directory push(@spg_common,$_) if exists $spg_source{$_}; } foreach(@spg_common){ next if $_=~/\.ffl|\.lst/; ## If a file has these extensions, they c +an be ignored my $prod=$spg_source{$_}; my $dev=$spg{$_}; next if !defined $prod and !defined $dev; ## Error handling if all f +iles match or can be ignored if($prod != $dev){ print RPT "/dsmpayroll/spg-source/$_\n"; } } print RPT "\n\nTPR Source Files\n"; print RPT "================\n"; my @tpr_common; foreach(keys %tpr){ next if $_=~/cob/; ## This is an empty directory that exists in each + source directory push(@tpr_common,$_) if exists $tpr_source{$_}; } foreach(@tpr_common){ next if $_=~/\.ffl|\.lst/; ## If a file has these extensions, they c +an be ignored my $prod=$tpr_source{$_}; my $dev=$tpr{$_}; next if !defined $prod and !defined $dev; ## Error handling if all f +iles match or can be ignored if($prod != $dev){ print RPT "/dsmpayroll/tpr-source/$_\n"; } } close RPT;

TStanley
--------
The only thing necessary for the triumph of evil is for good men to do nothing -- Edmund Burke

Replies are listed 'Best First'.
Re: File Differences
by ikegami (Patriarch) on May 31, 2005 at 16:39 UTC

    Instead of having a hash for each directory, you could have a hash of hashes:

    #!/opt/perl5/bin/perl -w use strict; use File::Slurp; # The production files will be listed in # the source hashes my $report = "/home/mis/tstanley/SourceDiff.rpt"; my %input = ( iqs_source => '/dsmpayroll/iqs-source', prg_source => '/dsmpayroll/prg-source', spg_source => '/dsmpayroll/spg-source', tpr_source => '/dsmpayroll/tpr-source', iqs => '/dsmmigrate/development/iqs', prg => '/dsmmigrate/development/prg', spg => '/dsmmigrate/development/spg', tpr => '/dsmmigrate/development/tpr', ); my %data; foreach my $key (keys(%input)) { my $dir = $input{$key}; chdir($dir); foreach my $file (read_dir('.')) { my $sum = `sum $file`; my ($x) = split(/\s+/, $sum); $data{$key}{$file} = $x; } } ## Open the report file ...

    Update: It seems there's no need to build a hash of hashes, since we only need to look at a pair of directories at a time. I elminated the pesky chdir commands while I was at it. Here it is a version with (just about) all redundancy eliminated, but without any change in functionality:

    #!/opt/perl5/bin/perl -w use strict; # The production files will be listed in # the source hashes my $report = "/home/mis/tstanley/SourceDiff.rpt"; sub DEV () { 0 } sub PROD () { 1 } my %input = ( IQS => [ '/dsmmigrate/development/iqs', '/dsmpayroll/iqs-source' ], PRG => [ '/dsmmigrate/development/prg', '/dsmpayroll/prg-source' ], SPG => [ '/dsmmigrate/development/spg', '/dsmpayroll/spg-source' ], TPR => [ '/dsmmigrate/development/tpr', '/dsmpayroll/tpr-source' ], ); ## Open the report file open RPT, ">$report" or die "Unable to open $report: $!\n"; my $date=`date`; print RPT "$date\n\n"; ## For each set of directories, we find the common file names in them, ## and then go through those files and compare the sums. If the ## Production file is different, we write it out to the report file. my $print_seperator; # False for first loop iteration. foreach my $key (sort keys %input) { local *DH; my $dir; my $file; my %dev_crcs; my %prod_crcs; $dir = $input{$key}[DEV]; opendir(DH, $dir) or die "Unable to open dir $dir: $!\n"; while (defined($file = readline(DIR))) { next if $_ eq '.'; next if $_ eq '..'; next if $_ eq 'cob'; next if substr($_, -4) eq '.ffl'; next if substr($_, -4) eq '.lst'; $dev_crcs{$file} = 0+`sum $dir/$file`; } $dir = $input{$key}[PROD]; opendir(DH, $dir) or die "Unable to open dir $dir: $!\n"; while (defined($file = readline(DIR))) { next if $_ eq '.'; next if $_ eq '..'; next if substr($_, -4) eq '.ffl'; next if substr($_, -4) eq '.lst'; $prod_crcs{$file} = 0+`sum $dir/$file`; } if ($print_seperator) { print RPT "\n\n"; } else { # True for all but first loop iteration. $print_seperator = 1; } print RPT "$key Source Files\n"; print RPT ("=" x length($key)) . "=============\n"; foreach $file (keys(%dev_crcs)) { if (exists($prod_crcs{$file}) && $prod_crcs{$file} != $dev_crcs{$file}) { print RPT "$dir/$file\n" } } }

    Keep in mind that files in one directory but not in the other are not printed, just like in your solution.

Re: File Differences
by tlm (Prior) on May 31, 2005 at 16:43 UTC

    This doesn't directly answer your question, but judging from your shebang line, I gather that this is a Unixoid system. The GNU utility diff (and maybe other diffs as well) can be used to compare directories. E.g.:

    % diff --recursive dir1 dir2

    the lowliest monk

      The person who asked for my help, had tried this. Some of the files are over 1000 lines in size, and trying to wade through the diff files was nearly impossible.

      TStanley
      --------
      The only thing necessary for the triumph of evil is for good men to do nothing -- Edmund Burke
        I wouldn't give up on diff though. You can use the -q switch which only outputs whether the files differ. By using some of the other diff switches (like -B which ignores changes whose lines are blank), you might be able to reduce the number of files that really differ.

        lupey

Re: File Differences
by davido (Cardinal) on May 31, 2005 at 16:35 UTC

    Well, for one thing, you could factor all those chdir... segments down to their common denominator and call them as a subroutine within a foreach loop instead of repeatedly typing the same code again and again.

    Other than that, it would help if we knew in what way you wished to change the code. Your question doesn't ask for anything specific.


    Dave

Re: File Differences
by TedPride (Priest) on May 31, 2005 at 17:26 UTC
    Looks like you need something like the following:
    use strict; use warnings; my @compare = ( ['/dir1/contents/','/dir2/contents','^cob$'], ['/dir1/contents/','/dir2/contents','\.ffl$','\.1st$'] ); my ($dir1, $dir2, @r, $regex); for (@compare) { ($dir1, $dir2, @r) = @$_; # Load dir info, create hash, compare hash # If hash doesn't match... $regex = join '|', @r; print "$regex\n"; # Sort and run through files, ignoring ones that regex matches if +regex exists. # Print directory and file info for files that don't match between + directories }
    The duplication in your code is in having to run the same pieces of code on many different directories and file names / endings. Create a structure instead that lets you easily store this info and loop through it, and your code can be reduced to a fraction of its previous size.
Re: File Differences
by TStanley (Canon) on Jun 07, 2005 at 16:57 UTC
    I finally was able to sit down last night at home and do a re-write of the script. I present to you, a much cleaner and shorter version:

    TStanley
    --------
    The only thing necessary for the triumph of evil is for good men to do nothing -- Edmund Burke

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://462110]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (2)
As of 2024-04-26 07:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found