Re: MD5-based Unique Session ID Generator
by stvn (Monsignor) on Aug 19, 2004 at 14:21 UTC
|
I would think hostname is a pretty hefty operation for genarating a session id, I'm not sure but I think it does a DNS lookup.
The ID is based on hostname, time, and some psuedo-random data. I've run a test with this to generate 50,000 IDs as fast as possible and check for collisions -- I didn't get any.
I use this for session ids (which I took from one of the Apache::Session modules)
use Digest::MD5;
$session_id = substr(md5_hex(md5_hex(time() . {} . rand() . $$)), 0, 3
+2);
I ran it within the same process over 100,000 times with no collisions.
This is sort of slow, but strong. Reducing the param for rand() will speed things, but make collisions more likely.
I am no crypto expert, but from what I know, Its not really any stronger than if you didn't do it this way. Using MD5 and different text each time, it is highly unlikely that you will find a collision actually, that is just the nature of MD5 and hashing algorithms in general.
| [reply] [d/l] |
|
Just the middle part of your expression(time() . {} . rand() . $$)helps making the session-id's unique.
| [reply] [d/l] |
|
Very true, but the double md5_hex() doesn't hurt (as far as I know).
As I said, I am no crypto expert, and my knowledge of these things is limited. But I would think that hashing a reasonably unique string to produce a pretty darn close to unique string, and then hashing it again to get (what I would assume is) an even closer to truely unique string is a good thing when generating session ids. Please though, if I am wrong, and the double hash provides no benefit let me know why, as I would be interested in knowing.
| [reply] [d/l] |
|
|
|
|
|
|
| [reply] |
|
| [reply] |
Re: MD5-based Unique Session ID Generator
by dragonchild (Archbishop) on Aug 19, 2004 at 14:00 UTC
|
How much stronger is this than md5_hex( time, $$, time ) where $$ is spread over 150 Apache child processes?
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
| [reply] [d/l] |
|
I haven't tested that case, but it really doens't apply to what this is used for. Please see my update note in the description...
| [reply] |
Re: MD5-based Unique Session ID Generator
by simonm (Vicar) on Aug 19, 2004 at 16:20 UTC
|
If you don't mind using 128 bits rather than 32, Data::UUID guarantees that you won't get duplicates, ever.
use Data::UUID;
use constant IDGenerator => Data::UUID->new();
sub new_sid { IDGenerator->create() }
Update: Duh, they're both the same size, 128 bits and 32 hex digits. | [reply] [d/l] |
|
MD5 is 128 bits. It's 32 hex digits.
I do prefer Data::UUID for this task, myself. Note that while it guarantees uniqueness, it doesn't guarantee unpredictibility, which may or may not be a problem for a given application. Most MD5/SHA1/whatever session generators have the same issue.
"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.
| [reply] |
|
If you don't mind using 128 bits rather than 32, Data::UUID guarantees that you won't get duplicates, ever.
Thanks! I won't be able to test this immediately, but if it works (and it seems like it will), it will be most helpful.
One point though, MD5 generates 32 hex digits representing 4 bits each - that's already 128 bits. Sorry if I was unclear about that.
| [reply] |
|
Data::UUID guarantees that you won't get duplicates, ever.
While Data::UUID is a good solution, it doesn't guarantee that "you won't get duplicates, ever" (heck - there are only 128bits after all :-)
As the docs say...
A UUID is 128 bits long, and is guaranteed to be different from all other UUIDs/GUIDs generated until 3400 CE.
...
It provides reasonably efficient and reliable framework for generating UUIDs and supports fairly high allocation rates -- 10 million per second per machine -- and therefore is suitable for identifying both extremely short-lived and very persistent objects on a given system as well as across the network.
So, it wouldn't be suitable if you were coding something up for The Long Foundation - or needed to allocate UUIDs really, really, really quickly :-)
The full gory detail can be found in this IETF draft.
All this complexity is, of course, why I like Data::UUID. People who are experts have taken the time to look hard at the algorithm, and I can have some confidence in it working well.
| [reply] |
|
| [reply] |
|
Re: MD5-based Unique Session ID Generator
by guha (Priest) on Aug 19, 2004 at 14:22 UTC
|
I'm definitely not an expert on cryptos and related issues, but the loop looks suspicious in my eyes.
Do you realize that you push anything between zero and 2 Mbytes through the MD5 routine, no wonder that it, sometimes i guess, takes time to generate a key.
| [reply] |
|
Do you realize that you push anything between zero and 2 Mbytes through the MD5 routine, no wonder that it, sometimes i guess, takes time to generate a key.
Considering that, it's actually remarkably speedy. If I'm just generating one key at a time, it is basically instantaneous. 50,000 keys took about 5 minutes (not bad, all things considered). In the application I have (see Update, please), I'm not generating more than 50 keys in a given 1s interval, but they absolutely must not duplicate.
Still, I will be exploring some of the tips in this thread regarding faster ways to accomplish the same thing; hopefully I will remember to update my snippet when I get around to testing them!
| [reply] |
Re: MD5-based Unique Session ID Generator
by pelagic (Priest) on Aug 19, 2004 at 14:12 UTC
|
Why do you use time 2 times in your list?
It will be the same both times.
| [reply] [d/l] |
|
Why do you use time 2 times in your list?
It will be the same both times.
I assume you are refering to dragonchild's code since the OP doesnt have time in there twice.
It will not matter if the time is the same, the idea is to generate a (sorta) unique string, and it will do that. Once put through md5_hex, it wont much matter after that. MD5 will give you the true uniqueness, all you really need a a bit of entropy to get it started.
| [reply] [d/l] [select] |
|
To add "time" a second time does not make the string more unique than with just once "time".
It makes the theoretical entropy higher but that's not the target here as we are not defending hackers. We just want to avoid collisions. The uniqeness of the id's must be achieved before feeding them through MD5.
| [reply] |
|
|
|
If we're talking about getting entropy, why don't we go with a better entropy source than the minor disparity between the two calls to time which at MOST will vary by one digit, which is not very entropic.
Why don't you just call hotbits and grab some radioactive decay data in hex format, break it apart and loop over it to give us some real entropy. That WILL decidely minimize the chance of collisions. Since your already acting against data returned by Sys::Hostname, this should be right up the alley of what your doing.
| [reply] |