Re: Using MD5 and the theory behind it

What MD5 does: takes some data, and turns it into a 128-bit 'digest' (also 'hash' or 'sum'). The 'hashing' is, for most practical purposes, one-way: given the digest, it is at least computationally prohibitive to reconstruct the original data (and different sets of data can have the same digest; so even if you found one that worked, you would have no reason to think it was what originally went in).

Think of the digest as the data's fingerprint, and you'll have the basic idea.

the gory details

A lot of systems store md5 digests of passwords: you can't reconstruct the password if you know the md5'd 'hash' of it. So instead of having a file that stores all of your users' passwords in plain text, you store the MD5'd versions of them. Then, on login, you md5 what they type in and compare that value to what's stored in your password file. Note: this isn't a perfect system -- if someone steals your password files, they can 'brute force' your users passwords by cycling through all possible strings, taking their md5 'sums' (digests) and comparing the values they get to what's in the file. You make this harder by putting non-alphanumeric characters and a mixture of cases in your passwords -- this increases the numbers of possibilities they have to loop through, making the 'brute forcing' more computationally expensive.

Another use is verifying that a bunch of data hasn't been modified without copying the data verbatim. Say, for example, you download suspicious tarball foo.tar.gz, and you want to know whether it's the *real* foo.tar.gz; you put it through the md5 algorithm and compare the result to the md5 "signature" you got from a *trusted* source, i.e. one you know was generated by the person who distributed the genuine foo.tar.gz. If the digests match, you can be certain for all reasonable purposes that you've got the real deal.

Similarly: if you've been cracked, and you want to know whether crucial files on your system were modified, you would have (before you were cracked) made md5 digests out of your crucial system binaries (e.g. everything in /usr/bin/ on a *nix system), and stored those digests in a secure location. You could then run a check: if the result obtained by running MD5 on the binary doesn't agree with the stored value, then the file's been modified. (I believe the tripwire security utility uses such a method).

On the web, you might use MD5 to verify a time-limited login. When a user logs in, you make an md5 'hash' (digest) out of their login name, password, a timestamp, and some secret key only you know, and set that as a value of a browser cookie. Then, on each request made by a user, you can make sure the user hasn't just copied over an old cookie (i.e. hasn't gone through your login procedure). and isn't some evil person trying to steal the legitimate user's identity by reading the user's cookie file and stealing, the user's password). You do this by comparing the value of the cookie to the value you compute on the spot out of the user's password, user name, legit timestamp, and the secret value.

(notice, this also keeps you from passing passwords over the connection in cleartext on every request: the contents of a cookie can be read by a clever enough cracker. But the md5 digest won't do them any good as far as stealing passwords (unless they're sniffing during the login process!))

Philosophy can be made out of anything. Or less -- Jerry A. Fodor

Comment on Re: Using MD5 and the theory behind it Download Code

Replies are listed 'Best First'.
Re: Re: Using MD5 and the theory behind it by gildir (Pilgrim) on Jan 10, 2001 at 14:39 UTC
<CITE>On the web, you might use MD5 to verify a time-limited login. When a user logs in, you make an md5 'hash' (digest) out of their login name, password, a timestamp, and some secret key only you know, and set that as a value of a browser cookie....</CITE> Why to do this kind of terrible thing? On my site, I just generate a random session ID and set that as a cookie. On server-side a make association of this session id with user's login name and a 'last access' timestamp. When the user returns, I just check the validity of session Id by following the association. This has several advatages: I do not have to compute (slow) MD5 on every request. Cookie value (SessionId) is random, that means totaly secure. It is not based on user's password or username. This will accomodate any authentication scheme (e.g. X.509 certs) not just plain passwords.	[reply]
Re: Re: Re: Using MD5 and the theory behind it by eg (Friar) on Jan 10, 2001 at 23:54 UTC
One way I've used hashes is to set the verified user's cookie to be something like: `$cookie = $user_id . $delimiter . hash( $user_id, "host secret passwor +d" );` [download] So on subsequent user accesses, all I need to do is split on the $delimiter and run $user_id, "host secret password" through the hashing algorithm and compare against the hash in the cookie to verify the user. I haven't look at the code at everydevel, but it looks like perlmonks does something similar to this. This trades a (slow) database access for a (slow) hash computation, so I'm not sure if there's a real winner (or if there is, it'll be system-dependent.) Just another option to consider ...	[reply] [d/l]
Re: Re: Re: Using MD5 and the theory behind it by rpc (Monk) on Jan 11, 2001 at 04:11 UTC
Your method is not 'totally secure' because you have to store the nonce in a database. If you generate a SID from an MD5 digest based on user authentication information, this hash does not have to be stored. It can be generated when the cookie is inspected. Also if you run a large site with millions of users, your source of entropy can be depleated quickly, negating any security you would have gained.	[reply]
Re: Re: Re: Re: Using MD5 and the theory behind it by gildir (Pilgrim) on Jan 11, 2001 at 14:02 UTC
That's really an academic debate. I should argue that your scheme depends on security of MD5 algorithm and therefore cannot be more secure than MD5. History shows that even cryptographic hashes has some problems, and if I recall correctly, some of the MD-series hashes did have problems. OTOH, my scheme depends only on security of server, and if attacker can read data from server's database, it will not look at nonce, but directly at the target data stored here. Authentication is here not only for authentication itself, but for data protection, and there is no point making authentication stronger than protection of data itself. And if I have large site, my entropy pool gets exhausted by SSL subsystem in the firts place, so I will need HW crypto-card (RND-generator) anyway.	[reply]
Re: Re: Using MD5 and the theory behind it by r.joseph (Hermit) on Jan 10, 2001 at 10:49 UTC
Thanks a ton...your reply was exactly what I was looking for - practical applications to help me understand. I even printed it out, for future reference. Nothing important in this message, just wanted to say thanks. R.Joseph	[reply]


We don't bite newbies here... much
	PerlMonks