http://qs321.pair.com?node_id=665966

Trihedralguy has asked for the wisdom of the Perl Monks concerning the following question:

I'm sure this has been discussed here before. However it's possibly listed by the actual modules name rather than a straight question.
Anyway! I'm working on an application that will be external that users will be able to register a username and password with this system.
I have been reading up on MD5 passwords and I was wondering, whats the best way to generate one of these based of a string.
In my readings I have also learned about how you shouldn't just use the straight string to encrypt with MD5 but to append a salt to the users password before encrypting it.
Is there a perl module that wraps all of this up pretty nicely and does the MD5 call and manages the salt for me? Thanks for any comments or suggestions anyone can provide :)

Replies are listed 'Best First'.
Re: Encryption and MD5
by mr_mischief (Monsignor) on Feb 04, 2008 at 17:31 UTC
    The two most important parts of using a salt are coming up with the salt and how you store the salt.

    It's easy enough to read 8 bytes or so from /dev/random on a Unix box, and 8 bytes is plenty of salt. If you don't have something so simple as a device file from which you can read bytes, then rand() or the pid divided by the current time and encoded Base64 (or, heck, MD5) might work for you.

    As for storing the salt, there are a number of options. Some applications stick it in a separate field just for the salt. Some use the first X or last X characters of the password field to store the salt (which means, of course, a fixed X characters of salt). Some put a colon, comma, slash, or some other character not appearing in the hash (or delimiting fields in the file) between the hashed password and the salt.

    Some take advantage of the fact that certain hashing algorithms produce fixed-length hashes, and stick the salt at the end of the password field. MD5, for example, creates 128-bit hashes usually represented as 32 bytes in hex. You could say the first 32 characters of your password field (or the last 32, or the first 16 and last 16, whatever) bytes are the hashed password, and that anything else is the salt.

    Appending or prepending the salt to the plaintext of the password before hashing is the simplest way to improve the chances of avoiding attacks built around rainbow tables. They still do little against true brute-force attacks. It's also possible to get creative with salting, like having the first character of the plaintext password determine whether the salt is prepended or appended, having characters of the password and salt interleaved, or splitting the password in half and putting the salt in the middle before hashing. Again, these cute tricks might provide additional help against table-based attacks, but are not going to stop a brute-force attack.

    Some applications will use some other field (or combination of fields) unique to the user as the salt instead of a random value. If you're using some other field(s) as the salt and not storing it separately from those fields, be aware that any updates to those fields require rehashing the password with the new salt at the time of the update.

    MD5 should be good enough, for some value of "good enough", for passwords. This is more true with the use of a salt. Be aware, though, that for longer strings it has been found to have fairly quickly reproducible hash collisions. The md5 article has links to papers about its strengths and weaknesses. I personally wouldn't worry about passphrases colliding, as it appears the collision-finding methods are based on longer files being hashed down to 128 bits simply losing too much Shannon entropy in the hashing process (since minimum message lengths are noted for the methods) and some impressive mathematical analysis of the peculiarities of how that happens in MD5. At the average length of most user-chosen passphrases, a dictionary or rainbow table attack is a bigger concern. I'd appreciate anyone with better cryptographic information or a better understanding of the issue correcting me.

Re: Encryption and MD5
by moritz (Cardinal) on Feb 04, 2008 at 14:56 UTC
    The salt is just another string, and computing the md5 sum is a simple mapper of calling a function:
    use Digest::MD5 qw(md5_hex); my $md5_password = md5_hex($password . $salt);

    No rocket science, and not worth a module on its own ;-)

Re: Encryption and MD5
by olus (Curate) on Feb 04, 2008 at 14:32 UTC

    I see no problem with using MD5 to store your passwords. You are not concerned about many users using the same password, they will have to authenticate with a login (or username) and the password, and the login is the one you have to concern in terms of uniqueness.

    Digest::MD5 helps you on working with MD5.

      One, you don't use MD5 to "encrypt" or "store" your passwords. MD5 is a digesting or hashing function (hence the name Digest::MD5, which means that the process is definitely a one-way street: you can't undigest a hash value and learn what the original password was.

      Two, while you're not concerned about "many users using the same password," you ARE concerned with the back end, where someone has access to your password store. This doesn't have to be direct database access or file access. Depending on protocol, this may also be a man-in-the-middle who might get to see unencrypted rsync activity from one server to another within your enterprise.

      There are two key weaknesses in unsalted hashing: (1) someone looking at the hash values and deducing which users use the same password, and which users use different passwords, and (2) someone creating a useful dictionary of possible passwords and their associated hash value. Salt solves both concerns.

      Let's compare passwords using three approaches. We show a table with the original passwords for several accounts, the password when passed through an encryption (and thus reversible) transform, the password hashed without salt, and the password hashed with salt.

      USER PASSWORD ENCRYPTED UNSALTED SALTED joe GoJets STEjOg 32F8ABC2 $&AB3A262C frank Br1tney YENT1Rb 5158BAD3 ^%1340CF01 mary GoJets STEjOg 32F8ABC2 (*7638BA7D

      From this table, we can see some flaws.

      I've picked a trivial function for encryption but more complicated ones would suffer the same vulnerability: figure out the key and you can unencrypt to restore the original passwords.

      Users mary and joe use the same password, and you can see that with unsalted and encrypted password stores, but you can't see that with the salted store. If you used the unsalted store, then the a dictionary can list hash value 32F8ABC2 and the password "GoJets".

      Okay, why is knowing common passwords a concern? If you wanted to socially engineer other things that only user mary can access, you might find talking to user joe to be a softer target, and user joe might leak other facts about user mary (her basketball team, her birthday, her dog, etc.) without anyone realizing it. Out of a list of fifty different users with forty-nine different passwords, the one guy who uses the same password is more likely your weak user joe.

      Adding salt makes a dictionary have to store a much larger number of hash values for each possible password. Hopefully, with enough salt, the dictionary must be many terabytes or larger, which makes dictionary attacks far less tenable. Better encryption OR hashing functions will also make the hidden value longer, which is to reduce collisions (two different real passwords with the same hidden value), as well as make dictionary computation harder.

      Update: Yes, salt must be chaotic/random to be of any use at all, it's really part of the definition of salt in this context.

      --
      [ e d @ h a l l e y . c c ]

        When using MD5 one is not interested in reversing the digest in order to get the original value. Instead, one wants to apply the same method as the original and then compare the results to check if they match.

        When I read the post I first thought it would be a web application, you are right to alert to the fact that there could be a middle-man in other scenarios, so there isn't much to do but to collect the information that comes from the browser and work with that. Given the wanted level of security, HTTPS should be considered.

        I agree to your concerns on security, but in fact neglected the same values for different users, assuming integrity on DB level.
        Still, salt won't solve the problem if it always the same, as it would give the same results for equal passwords. So, salt should be used and constructed in such a manner that it is different for each different user, but produces the same result for any given user. And you can do that while working with Digest::MD5.

        There is a lot to say about security, and I am far from being an expert.
        Thank you for your input.

Re: Encryption and MD5
by Trihedralguy (Pilgrim) on Feb 04, 2008 at 16:33 UTC
    Thanks, this is exactly what I was looking for!