Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

MD 5 hash comparison/checker

by daggy (Novice)
on May 06, 2010 at 23:34 UTC ( [id://838806]=perlquestion: print w/replies, xml ) Need Help??

daggy has asked for the wisdom of the Perl Monks concerning the following question:

I've given up on the search files utility as I can't get it to work.

I have an MD5 hash perl tool

It currently allows the user to input two file aths and it then displays the md5 hash of each file...

as it merely displays the path, I was wondering if it's possible to tell the user if there are differences or if they are the same

Here is the code:

use digest::MD5; print "\nPlease enter filepath and name (including file extension) of +the first file you would like to compare:\n\n"; $fileone = <STDIN>; chomp $fileone; print "\n\n"; print "\nPlease enter filepath and name (including file extension) of +the first file you would like to compare:\n\n"; $filetwo = <STDIN>; chomp $filetwo; print "\n\n"; open (FILE, $fileone) or print "File invalid, Please try again."; print "The MD5 hash of file one is: "; $md5a = Digest::MD5->new->addfile(*FILE)->clone->hexdigest; print "$md5a\n\n"; open (FILE, $filetwo) or print "File invalid, Please try again."; print "The MD5 hash of file two is: "; $md5b = Digest::MD5->new->addfile(*FILE)->clone->hexdigest; print "$md5b\n\n";

Replies are listed 'Best First'.
Re: MD 5 hash comparison/checker
by ikegami (Patriarch) on May 06, 2010 at 23:47 UTC
    print($md5a eq $md5b ? "same" : "different");

      That worked perfectly, is it possible to display a snippet of the area that's different if the hash files aren't the same, or is this a waste of time?

      Also do you think it would be possible to display the MD6 hash as well as the MD5 hash?

        Question 1: "display a snippet?" Not without using an entirely different approach. MD5 is useful ONLY to compare the totality of the files; not of segments. If you really want to know what the differences are, use a tool built for that purpose, or write your own tool using line-by-line comparison (yes, you could do it para by para or use some other unit, but the concept is simple enough line-by-line.).

        Question 2: MD6? Why? What-on-earth-good would that do?

Re: MD 5 hash comparison/checker
by ikegami (Patriarch) on May 07, 2010 at 01:48 UTC
    Do you actually use the md5 for anything other than checking if the two files are the different? If so, you're just wasting time calculating it.
    use strict; use warnings; open(my $fh1, '<', $ARGV[0]) or die $!; binmode($fh1); open(my $fh2, '<', $ARGV[1]) or die $!; binmode($fh2); for (;;) { defined(read($fh1, my $buf1='', 4096)) or die $!; defined(read($fh2, my $buf2='', 4096)) or die $!; if ($buf1 ne $buf2) { print("Different\n"); exit(1); } last if !length($buf1); } print("Same\n"); exit(0);
      Um... okay, if it's just a question of comparing one file to one other file to determine "same" or "different", a byte-for-byte comparison like you suggest certainly makes the most sense. Good call. (Update: Of course, just using the *n*x "cmp" utility will be a lot easier/quicker.)

      But if it were a case of looking for duplicates among a large set of files, using the md5 signatures of the files (in combination with file byte counts) will save a lot of time. (I don't know if the OP represents this sort of "XY Problem" -- talking about comparing two files when the task is actually bigger than that -- but it's worth mentioning in any case.)

        (Update: Of course, just using the *n*x "cmp" utility will be a lot easier/quicker.)

        He's on Windows (or else use digest::MD5; wouldn't have worked), and it was faster for me to type up the program than two figure out the dos command :)

        But if it were a case of looking for duplicates among a large set of files

        Indeed, but there's no evidence of that. That's why I asked and suggested an alternative.

      Hi, yeah it's used to compare the hashes.

      I tryed your code, but it wont let me specify which files I'd like compared.

      Also, I've noticed when I run code in perl, at the end of the code it automatically shuts down so I can't read the results, how do I stop this?

      It doesn't happen if I run from CMD, but if I just click the .pl file it shuts down at the end.

        but if I just click the .pl file it shuts down at the end.

        Ah. That explains a great deal. If you really want/expect the script to work when it gets launched by clicking on the file's icon in a file browser, consider the following idiom:

        #!/usr/bin/perl # (use a unix/linux style shebang line, # because someday you will want to use a unix/linux system) use strict; my $reqd_param_count = 2; # (e.g. two file names) if ( ! @ARGV ) { # prompt for interactive input of required parameter(s) ... } elsif ( @ARGV == $reqd_param_count ) { # invoked from an interactive shell: required params are in @ARGV ... } else { die "Usage: $0 arg1 ...\n"; }
        But seriously, there ought to be a sensible way to set things up so that a user can easily invoke a perl script with args (that will go into @ARGV). If not, just please switch to some sort of GUI approach (Tk, wx, etc), or else get cozy with using a CLI shell ("bash" is available for windows, and is the best, IMHO).
        Command line tools are more useful when they accept file names from the command line.
        perl compare.pl file1 file2

        But if you prefer to prompt the user, feel free to adjust at will.

      for (;;) { defined(read($fh1, my $buf1='', 4096)) or die $!; defined(read($fh2, my $buf2='', 4096)) or die $!; if ($buf1 ne $buf2) { print("Different\n"); exit(1); } last if !length($buf1); }

      Why not File::Compare?

Re: MD 5 hash comparison/checker
by BrowserUk (Patriarch) on May 07, 2010 at 00:38 UTC

    If the files are big, you can save some time in many cases, by checking if the files are the same size first and only doing the md5 if they are.

Re: MD 5 hash comparison/checker
by toolic (Bishop) on May 07, 2010 at 00:02 UTC
    use digest::MD5;
    For better portability, change that to (upper-case D):
    use Digest::MD5;
    Digest::MD5
Re: MD 5 hash comparison/checker
by graff (Chancellor) on May 07, 2010 at 03:05 UTC
    I have an MD5 hash perl tool. It currently allows the user to input two file paths and it then displays the md5 hash of each file.

    Why would a user want to do that?

    I was wondering if it's possible to tell the user if there are differences or if they are the same

    Look up the unix/linux "cmp" and "diff" commands. They were designed to do just that.

    Here's the code...

    I'm always baffled when I see these SoPW posts where the code is set up to ask questions of the user and read the user's responses from STDIN. It's almost always true that the information being asked for could/should be provided as command line args, so that the script gets them from @ARGV instead of reading them from STDIN.

    Once you understand that every CLI shell worth using keeps a command history, and interprets the "up-arrow" key as "go back to the previous command line", you'll see how much nicer it is to use command-line args to convey run-time instructions to your script (e.g. names of files, optional parameters, etc). If you are using a shell that doesn't support command history and recall, get another shell.

    As for the actual task that you're trying to accomplish: if it involves looking for duplicate file content, build a table that has path/filename, file_byte_count and file_md5 for the files of interest, sort the table on md5 and file size, and just do a byte-for-byte comparison (à la 'diff', 'cmp' or ikegami's code snippet) on the files that have identical sizes and md5s. (It's entirely possible that two files of the same size may have the same md5 sig, despite having different content.)

      HI,

      thanks for the reply...

      It's for an assignment, so it's more a hypothetical scenario, as opposed to a practical one.

      Do you know whay the perl module closes at the end of the code?

      It closes too soon, so I'm unable to actually read the results.

      Whereas if I run it through CMD it works fine.

        The console closes as soon as no program is running in it ...by default. You can configure the console to stay open.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://838806]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-03-29 05:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found