MD5 (and other one-way hash functions like CRC32) are designed
to take in a string and convert it to a shorter string, kind
of a
fingerprint of the original string. Diffrent
one-way hash functions produce
fingerprints of diffrent
lengths. But the following
criteria should hold for all good one-way hash functions:
- you can not learn anything about the input string by examining
its fingerprint except for the fact that it has that fingerprint
- a small change (even a single bit) in the input string
should cause a dramatic change in the output of the hash function
I deal with a good bit of datacomm and file transfers.
I use MD5 to identify when I have received suspect duplicate
files. I keep a DB table with the MD5 values of all
the files that have been transmitted to me. Whenever I
get a new file, I compare its MD5 valye to those stored
in the table. If the value is not in the table, I process
the file and store its MD5 value in the table. If the value
is in the table I set the file asside for special handling and
notify an operator.
If you really want to learn about exactly how the (and other hash algorighms)
work I recomend checking out Applied Cryptography by Bruce Schneier.