Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

File::Temp: 2 interfaces get different results with Digest::MD5 and File::Compare

by jkeenan1 (Deacon)
on Aug 29, 2021 at 16:55 UTC ( [id://11136181]=perlquestion: print w/replies, xml ) Need Help??

jkeenan1 has asked for the wisdom of the Perl Monks concerning the following question:

I am observing subtle differences between tempfiles created by the two interfaces of File::Temp -- the functional and the object-oriented. These differences do not show up when you 'diff' or 'vimdiff' two files. They only show up when you take a digest of the two files (e.g., via Digest::MD5->hexdigest) or compare the two files with File::Compare.

First, File::Temp's functional interface. Suppose I write the same content to two different tempfiles, then compare the resulting tempfiles in two ways: (1) a wrapper around Digest::MD5->hexdigest; and (2) File::Compare::compare().

#!perl use 5.14.0; use warnings; use Carp; use Data::Dumper; use Digest::MD5; use File::Compare (qw| compare |); use File::Temp qw( tempfile ); use Test::More; my $basic = 'x' x 10**2; my @digests; my ($fh1, $t1) = tempfile(); for (1..100) { say $fh1 $basic } close $fh1 or croak "Unable to close $t1 after writing"; push @digests, hexdigest_one_file($t1); my ($fh2, $t2) = tempfile(); for (1..100) { say $fh2 $basic } close $fh2 or croak "Unable to close $t2 after writing"; push @digests, hexdigest_one_file($t2); say Dumper [ @digests ]; cmp_ok($digests[0], 'eq', $digests[1], "Same md5_hex for $t1 and $t2"); is(compare($t1, $t2), 0, "compare() indicates no differences between $ +t1 and $t2"); done_testing(); sub hexdigest_one_file { my $filename = shift; say "Filename: $filename"; my $state = Digest::MD5->new(); open my $FH, '<', $filename or croak "Unable to open $filename for + reading"; $state->addfile($FH); close $FH or croak "Unable to close $filename after reading"; return $state->hexdigest; }
Let's run the file:
$ prove -v hexdig1.t hexdig1.t .. Filename: /tmp/0qZ8en3x1Y Filename: /tmp/GQuMULnyLM $VAR1 = [ 'e395fd01f84d7d1006a99e2a6b8fb832', 'e395fd01f84d7d1006a99e2a6b8fb832' ]; ok 1 - Same md5_hex for /tmp/0qZ8en3x1Y and /tmp/GQuMULnyLM ok 2 - compare() indicates no differences between /tmp/0qZ8en3x1Y and +/tmp/GQuMULnyLM 1..2 ok All tests successful.
Second, let's use File::Temp's object-oriented interface. We'll create two objects and write the same content with each, then compare the files generated in the same ways as above.
#!perl use 5.14.0; use warnings; use Carp; use Data::Dumper; use Digest::MD5; use File::Compare (qw| compare |); use File::Temp qw( tempfile ); use Test::More; my $basic = 'x' x 10**2; my @digests; my $t3 = File::Temp->new( UNLINK => 0); for (1..100) { say $t3 $basic } push @digests, hexdigest_one_file($t3); my $t4 = File::Temp->new( UNLINK => 0); for (1..100) { say $t4 $basic } push @digests, hexdigest_one_file($t4); say Dumper [ @digests ]; cmp_ok($digests[0], 'eq', $digests[1], "Same md5_hex for $t3 and $t4"); is(compare($t3, $t4), 0, "compare() indicates no differences between $ +t3 and $t4"); done_testing(); sub hexdigest_one_file { my $filename = shift; say "Filename: $filename"; my $state = Digest::MD5->new(); open my $FH, '<', $filename or croak "Unable to open $filename for + reading"; $state->addfile($FH); close $FH or croak "Unable to close $filename after reading"; return $state->hexdigest; }
Let's run the file:
$ prove -v hexdig2.t hexdig2.t .. Filename: /tmp/vO_5WGnJ2V Filename: /tmp/3nw1qtQRAm $VAR1 = [ '24676db37df646bab175feedec39259d', '24676db37df646bab175feedec39259d' ]; ok 1 - Same md5_hex for /tmp/vO_5WGnJ2V and /tmp/3nw1qtQRAm ok 2 - compare() indicates no differences between /tmp/vO_5WGnJ2V and +/tmp/3nw1qtQRAm 1..2 ok All tests successful.
But third, let's use both File::Temp interfaces within the same file. Once again, we'll write the same content and use two ways of comparing the output.
#!perl use 5.14.0; use warnings; use Carp; use Data::Dumper; use Digest::MD5; use File::Compare (qw| compare |); use File::Temp qw( tempfile ); use Test::More; my $basic = 'x' x 10**2; my @digests; my ($fh1, $t1) = tempfile(); for (1..100) { say $fh1 $basic } close $fh1 or croak "Unable to close $t1 after writing"; push @digests, hexdigest_one_file($t1); my $t3 = File::Temp->new( UNLINK => 0); for (1..100) { say $t3 $basic } push @digests, hexdigest_one_file($t3); say Dumper [ @digests ]; cmp_ok($digests[0], 'eq', $digests[1], "Same md5_hex for $t1 and $t3"); is(compare($t1, $t3), 0, "compare() indicates no differences between $ +t1 and $t3"); done_testing(); sub hexdigest_one_file { my $filename = shift; say "Filename: $filename"; my $state = Digest::MD5->new(); open my $FH, '<', $filename or croak "Unable to open $filename for + reading"; $state->addfile($FH); close $FH or croak "Unable to close $filename after reading"; return $state->hexdigest; }
Let's run the file:
$ prove -v hexdig3.t hexdig3.t .. Filename: /tmp/zgeqEkMDL8 Filename: /tmp/TrjkWZFjs2 $VAR1 = [ 'e395fd01f84d7d1006a99e2a6b8fb832', '24676db37df646bab175feedec39259d' ]; not ok 1 - Same md5_hex for /tmp/zgeqEkMDL8 and /tmp/TrjkWZFjs2 # Failed test 'Same md5_hex for /tmp/zgeqEkMDL8 and /tmp/TrjkWZFjs2' # at hexdig3.t line 25. # got: 'e395fd01f84d7d1006a99e2a6b8fb832' # expected: '24676db37df646bab175feedec39259d' not ok 2 - compare() indicates no differences between /tmp/zgeqEkMDL8 +and /tmp/TrjkWZFjs2 # Failed test 'compare() indicates no differences between /tmp/zgeqE +kMDL8 and /tmp/TrjkWZFjs2' # at hexdig3.t line 28. # got: '1' # expected: '0' 1..2 # Looks like you failed 2 tests of 2. Dubious, test returned 2 (wstat 512, 0x200) Failed 2/2 subtests Test Summary Report ------------------- hexdig3.t (Wstat: 512 Tests: 2 Failed: 2) Failed tests: 1-2 Non-zero exit status: 2
Both my wrapper around Digest::MD5->hexdigest() and File::Compare::compare() indicate that the files created by the two different File::Temp interfaces differ from each other. But if I examine those files with, say, 'diff' or 'od -c', I don't get any differences!
[tmp] $ ls -l /tmp/zgeqEkMDL8 /tmp/TrjkWZFjs2 -rw------- 1 jkeenan jkeenan 10100 Aug 29 12:46 /tmp/TrjkWZFjs2 -rw------- 1 jkeenan jkeenan 10100 Aug 29 12:46 /tmp/zgeqEkMDL8 [tmp] $ diff /tmp/zgeqEkMDL8 /tmp/TrjkWZFjs2 [tmp] $

Can anyone explain these anomalies? Thank you very much.

Jim Keenan

Replies are listed 'Best First'.
Re: File::Temp: 2 interfaces get different results with Digest::MD5 and File::Compare
by Corion (Patriarch) on Aug 29, 2021 at 17:17 UTC

    Without replicating your situation, did you look at whitespace differences? I see that you're not setting :raw on the file handles nor you're using binmode on them, so that would be the first place to look for me.

      There is no whitespace in the strings being printed to the tempfiles.

      Which filehandles are you referring to? The File::Temp filehandles or the handle inside hexdigest_one_file()?

      (Note: I did tried binmode $FH inside that subrountine's definition. It made no difference.)

      Just now I tried: open my $FH, "<:raw", $filename or croak "Unable to open $filename for reading";. That did not make any difference, either.

      Jim Keenan

        You're using say instead of print, so whitespace certainly is involved.

        You disabled the layers on reading the data back but did you disable the layers when writing the file? I think you're usually on Windows and there, Perl (and say) will usually output \r\n to files.

        Update:On further inspection, the file sizes of the two files are identical, so there is something else afoot. Sorry for this noise.

        I looked at replicating your situation using IO layers, but while I can provoke a difference using the :crlf filehandle, I don't get the digests you posted:

        #!perl use 5.14.0; use strict; use warnings; use Carp; use Data::Dumper; use Digest::MD5; use File::Compare (qw| compare |); use File::Temp qw( tempfile ); use Test::More tests => 1; my $basic = 'x' x 10**2; my @digests; my ($fh1, $t1) = tempfile(); binmode $fh1, ':raw'; for (1..100) { say $fh1 $basic } close $fh1 or croak "Unable to close $t1 after writing"; push @digests, hexdigest_one_file($t1); diag "$t1: $digests[0]"; my $t3 = File::Temp->new( UNLINK => 0); binmode $t3, ':crlf'; for (1..100) { say $t3 $basic } close $t3 or croak "Unable to close $t3 after writing"; push @digests, hexdigest_one_file($t3); diag "$t3: $digests[1]"; is $digests[0], $digests[1]; sub hexdigest_one_file { my $filename = shift; say "Filename: $filename"; #open my $FH, '<', $filename or croak "Unable to open $filename fo +r reading"; #print for <$FH>; #close $FH; my $state = Digest::MD5->new(); open my $FH, '<:raw', $filename or croak "Unable to open $filename + for reading"; $state->addfile($FH); close $FH or croak "Unable to close $filename after reading"; return $state->hexdigest; }
        1..1 Filename: /tmp/MdfRQx3DVl # /tmp/MdfRQx3DVl: e395fd01f84d7d1006a99e2a6b8fb832 Filename: /tmp/x589MI1yYB # /tmp/x589MI1yYB: 7651c6edc9ebdcfa617bcc99e1c8a6f2 not ok 1 # Failed test at tmp.pl line 29. # got: 'e395fd01f84d7d1006a99e2a6b8fb832' # expected: '7651c6edc9ebdcfa617bcc99e1c8a6f2' # Looks like you failed 1 test of 1.

        Update2 Have you asked md5sum about which sum is correct? For my code, md5sum outputs hashes identical to what Perl computes for each file.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11136181]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2024-04-18 22:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found