Re: Using MD5 and the theory behind it

MD5 (and other one-way hash functions like CRC32) are designed to take in a string and convert it to a shorter string, kind of a fingerprint of the original string. Diffrent one-way hash functions produce fingerprints of diffrent lengths. But the following criteria should hold for all good one-way hash functions:

you can not learn anything about the input string by examining its fingerprint except for the fact that it has that fingerprint
a small change (even a single bit) in the input string should cause a dramatic change in the output of the hash function

I deal with a good bit of datacomm and file transfers. I use MD5 to identify when I have received suspect duplicate files. I keep a DB table with the MD5 values of all the files that have been transmitted to me. Whenever I get a new file, I compare its MD5 valye to those stored in the table. If the value is not in the table, I process the file and store its MD5 value in the table. If the value is in the table I set the file asside for special handling and notify an operator.

If you really want to learn about exactly how the (and other hash algorighms) work I recomend checking out Applied Cryptography by Bruce Schneier.

Comment on Re: Using MD5 and the theory behind it

Replies are listed 'Best First'.
Re: Re: Using MD5 and the theory behind it by r.joseph (Hermit) on Jan 10, 2001 at 06:43 UTC
You say that you 'compare its MD5' value to the values in a table. How do you get an MD5 value for a file? What exactly do you mean by this process (I believe that this process is very similar to the one that I am attempting). Thanks for the help!	[reply]
Re: Re: Re: Using MD5 and the theory behind it by arturo (Vicar) on Jan 10, 2001 at 06:56 UTC
For reasonable-sized files (ones that fit comfortably in system memory): load the file's contents into a perl scalar, say `$foo`. Then `$fingerprint = md5($foo);` If you look through the documentation you have for it, you'll get some advice on other methods; e.g. (the object-oriented versions) : `my $file ="/file/to/hash"; my $md5 = Digest::MD5->new(); $md5->addfile($file); $md5->add("seekrit passwerd"); # not the best choice for one, but ... my $digest = $md5->digest;` [download] I got this straight out of the docs, more or less. HTH Philosophy can be made out of anything. Or less -- Jerry A. Fodor	[reply] [d/l] [select]
(correction) Re (5): Using MD5 and the theory behind it by mwp (Hermit) on Jan 11, 2001 at 07:04 UTC
Small correction: `my $file = "/file/to/hash"; my $md5 = Digest::MD5->new(); open(MD5, $file) \|\| die "Unable to open file: $!\n"; binmode(MD5); $md5->addfile(*MD5); $md5->add("seekrit passwerd"); # tee hee my $digest = $md5->digest;` [download] Your original code will not work with the latest Digest::MD5, producing the error "Not a valid filehandle." I know this because I'm currently writing a utility script that uses MD5 to verify downloaded files (for the Slackware distrib, actually) and I tried it your way to no avail. =) 'kaboo	[reply] [d/l]


Clear questions and runnable code get the best and fastest answer
	PerlMonks