Amphiaraus has asked for the wisdom of the Perl Monks concerning the following question:
Is there a Perl version of UNIX "cmp"?
The UNIX "cmp" command is used to compare 2 very large, non-human-readable files to confirm if they are the same, or different. Unlike UNIX "diff", UNIX "cmp" is "large-file-aware" and does not bungle the job when comparing 2 large files as UNIX "diff" is prone to do.
Sample "cmp" operations:
TWO FILES ARE SAME:
> cmp engine_security.a@@/main/par_x_rush_r55.1/2 engine_security.a@@/main/par_x_rush_r55.1/2
> echo $?
0
TWO FILES ARE DIFFERENT:
> cmp engine_security.a@@/main/par_x_rush_r55.1/1 engine_security.a@@/main/par_x_rush_r55.1/2
engine_security.a@@/main/par_x_rush_r55.1/1 engine_security.a@@/main/par_x_rush_r55.1/2 differ: char 29, line 2
> echo $?
1
I work for Motorola, and our Perl coding practices say to avoid qx and system calls to non-Perl functions, in all cases in which a Perl equivalent to a UNIX etc. function can be found.
Re: Is there a Perl version of UNIX "cmp" ?
by zentara (Archbishop) on Feb 06, 2009 at 19:29 UTC
|
#!/usr/bin/perl
#generally use File::Compare available in Perl5.8
print cmp_file(@ARGV) ? "equal\n" : "not equal\n";
#################
use constant BUF_SIZE => 4096;
sub cmp_file {
my ( $f1, $f2 ) = @_;
open my $h1, $f1 or die "gah $f1: $!";
open my $h2, $f2 or die "gah $f2: $!";
binmode $h1;
binmode $h2;
my ( $buf1, $buf2 );
my $equal = 0;
while ( read $h1, $buf1, BUF_SIZE ) {
read $h2, $buf2, BUF_SIZE;
last unless $equal = ( $buf1 eq $buf2 );
}
return $equal and eof $h2;
}
| [reply] [d/l] |
Re: Is there a Perl version of UNIX "cmp" ?
by zentara (Archbishop) on Feb 06, 2009 at 19:10 UTC
|
The unix cmp utility will probably be faster on large files, so why not run it thru system or backticks? It's commonly done with sort on large files.
| [reply] |
Re: Is there a Perl version of UNIX "cmp" ?
by Anonymous Monk on Feb 06, 2009 at 19:36 UTC
|
| [reply] |
|
Does anyone know if File::Compare's compare() function is "large-file-aware"? i.e. does it reliably return a correct boolean value when comparing large non-human-readable files?
| [reply] |
|
| [reply] |
|
| [reply] [d/l] [select] |
Re: Is there a Perl version of UNIX "cmp" ?
by samtregar (Abbot) on Feb 06, 2009 at 19:22 UTC
|
I agree that there's no compelling reason not to just call it from system() but if you did need it in Perl code it'd be a fun challenge. I think I'd do it by reading in chunks from each file and doing an MD5 on each chunk with Digest::MD5. Compare the MD5s and if they're different then you've got a difference. Oh, and start by comparing file sizes!
-sam
| [reply] |
|
Uh, you start with block A and block B in memory. Not sure how you think computing the MD5 on each block (hitting every byte, doing math) is going to be any faster than just comparing the blocks themselves (comparing byte by byte, but stopping on first difference). Bizarre.
| [reply] |
|
MD5 is magic! Ok, good point.
-sam
| [reply] |
|
Re: Is there a Perl version of UNIX "cmp" ?
by jdporter (Paladin) on Feb 07, 2009 at 12:37 UTC
|
The tag line of tye's Algorithm::Diff modules says that it computes "'intelligent' differences between two files / lists"...
but the doc doesn't explain how to use it on files, only on arrays. I suppose you could tie the two files to arrays using Tie::File,
but I'm not sure how efficient that would be...
Between the mind which plans and the hands which build, there must be a mediator... and this mediator must be the heart.
| [reply] |
|
|