Is there a Perl version of UNIX "cmp" ?

Amphiaraus has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Is there a Perl version of UNIX "cmp" ? by zentara (Archbishop) on Feb 06, 2009 at 19:29 UTC
File::Compare `#!/usr/bin/perl #generally use File::Compare available in Perl5.8 print cmp_file(@ARGV) ? "equal\n" : "not equal\n"; ################# use constant BUF_SIZE => 4096; sub cmp_file { my ( $f1, $f2 ) = @_; open my $h1, $f1 or die "gah $f1: $!"; open my $h2, $f2 or die "gah $f2: $!"; binmode $h1; binmode $h2; my ( $buf1, $buf2 ); my $equal = 0; while ( read $h1, $buf1, BUF_SIZE ) { read $h2, $buf2, BUF_SIZE; last unless $equal = ( $buf1 eq $buf2 ); } return $equal and eof $h2; }` [download] I'm not really a human, but I play one on earth Remember How Lucky You Are	[reply] [d/l]
Re: Is there a Perl version of UNIX "cmp" ? by zentara (Archbishop) on Feb 06, 2009 at 19:10 UTC
The unix cmp utility will probably be faster on large files, so why not run it thru system or backticks? It's commonly done with sort on large files. I'm not really a human, but I play one on earth Remember How Lucky You Are	[reply]
Re: Is there a Perl version of UNIX "cmp" ? by Anonymous Monk on Feb 06, 2009 at 19:36 UTC
Sure , in perl power tools, cmp, or File::Compare	[reply]
Re^2: Is there a Perl version of UNIX "cmp" ? by Amphiaraus (Beadle) on Feb 06, 2009 at 21:39 UTC
Does anyone know if File::Compare's compare() function is "large-file-aware"? i.e. does it reliably return a correct boolean value when comparing large non-human-readable files?	[reply]
Re^3: Is there a Perl version of UNIX "cmp" ? by runrig (Abbot) on Feb 06, 2009 at 22:09 UTC
If perl was compiled with the USE_LARGE_FILES flag (which it likely was if your OS handles large files), it will handle large files (see "perl -V" for that info). Still, "cmp" is likely going to be much faster than File::Compare. The only way to tell is to try both on your large files. Coding practices are fine, but they should be guidelines, not absolutes. Update: quick benchmark on two identical 1GB files on HP-UX - 13.5 secs (cmp) vs. 17.5 seconds (File::Compare). "much faster" is relative it seems :-)	[reply]
Re^3: Is there a Perl version of UNIX "cmp" ? by kwaping (Priest) on Feb 06, 2009 at 23:38 UTC
Looking at the source for File::Compare, there is an undocumented third input that is used as the read buffer size. A default is used if that third argument is not provided, which is the size of the first file (`-s FROM`). If that file - or the third argument - is larger than `1024 * 1024 * 2`, that number (2mb) is used as the buffer size. Basically, it reads the file in chunks up to 2mb, so it should be able to handle files of virtually any size given enough time. --- It's all fine and dandy until someone has to look at the code.	[reply] [d/l] [select]
Re: Is there a Perl version of UNIX "cmp" ? by samtregar (Abbot) on Feb 06, 2009 at 19:22 UTC
I agree that there's no compelling reason not to just call it from system() but if you did need it in Perl code it'd be a fun challenge. I think I'd do it by reading in chunks from each file and doing an MD5 on each chunk with Digest::MD5. Compare the MD5s and if they're different then you've got a difference. Oh, and start by comparing file sizes! -sam	[reply]
Re^2: Is there a Perl version of UNIX "cmp" ? by merlyn (Sage) on Feb 06, 2009 at 23:11 UTC
Uh, you start with block A and block B in memory. Not sure how you think computing the MD5 on each block (hitting every byte, doing math) is going to be any faster than just comparing the blocks themselves (comparing byte by byte, but stopping on first difference). Bizarre. -- Randal L. Schwartz, Perl hacker	[reply]
Re^3: Is there a Perl version of UNIX "cmp" ? by samtregar (Abbot) on Feb 07, 2009 at 05:13 UTC
MD5 is magic! Ok, good point. -sam	[reply]
Re^4: Is there a Perl version of UNIX "cmp" ? by ikegami (Patriarch) on Feb 09, 2009 at 19:16 UTC
Re: Is there a Perl version of UNIX "cmp" ? by jdporter (Paladin) on Feb 07, 2009 at 12:37 UTC
The tag line of tye's Algorithm::Diff modules says that it computes "'intelligent' differences between two files / lists"... but the doc doesn't explain how to use it on files, only on arrays. I suppose you could tie the two files to arrays using Tie::File, but I'm not sure how efficient that would be... Between the mind which plans and the hands which build, there must be a mediator... and this mediator must be the heart.	[reply]


laziness, impatience, and hubris
	PerlMonks