Randomness (was Re: Re: Re: Use time() to create unique ID)

Rand guarantees scarcity in a finite dataset, but it doesn't guarantee uniqueness.

Let's not get hung up in the difference between the practical and the theoretical here. :-)

For the purposes stated by the OP using rand() is "good enough." Also based on my own practical use using this method to generate unique session ids for web transactions I have found that it works very well.

When using this method in my own applications I have very deliberately set up trapping logic checking to make sure that a generated session id is not already in use and if it ever happens the logic logs the incident. The log is still empty for one application I use it for and that web application was installed in August of 2001. Over two years now and no collisions. I think that works pretty darn good.

Truly random and unique ids

The one time I needed to generate truly random numbers for an application I wrote (I could tell you what it was but then I'd have to shoot you) :) I decided the best way to do it was taking a page from PGP and GNUpg and use system entropy. Stuff like watching the position of the system disk heads, being influenced by system interrupts (mouse, keyboard, etc.) and stuff like that.

You can make yourself nuts with the whole subject and folks a lot smarter than me have made their academic mark on the world writing papers on the subject and there is even a whole field science dedicated to the subject. For practical purposes you have to make a decision as to what constitutes "random enough" and code accordingly. A random ID of 128 characters is probably going to be random enough for 99% of the uses out there .

But then... we are getting way off topic here...

Peter L. Berghold -- Unix Professional Peter at Berghold dot Net
	Dog trainer, dog agility exhibitor, brewer of fine Belgian style ales. Happiness is a warm, tired, contented dog curled up at your side and a good Belgian ale in your chalice.

Comment on Randomness (was Re: Re: Re: Use time() to create unique ID)

Replies are listed 'Best First'.

Re: Randomness (was Re: Re: Re: Use time() to create unique ID)
by sauoq (Abbot) on Sep 16, 2003 at 23:07 UTC

Also based on my own practical use using this method to generate unique session ids for web transactions I have found that it works very well.

There is a big difference between "works very well" and hasn't broken yet. Using a random number for a unique ID is akin to adding a known but rarely encountered bug. It is a terrible solution to a problem which has known good solutions.

-sauoq
"My two cents aren't worth a dime.";

[reply]

Re^2: Randomness

by blue_cowdawg (Monsignor) on Sep 17, 2003 at 00:24 UTC

There is a big difference between "works very well" and hasn't broken yet.

It all boils down to what you consider to be acceptably "broken" and what your exposure is.

Peter L. Berghold -- Unix Professional Peter at Berghold dot Net
	Dog trainer, dog agility exhibitor, brewer of fine Belgian style ales. Happiness is a warm, tired, contented dog curled up at your side and a good Belgian ale in your chalice.

[reply]

Re: Re^2: Randomness

by sauoq (Abbot) on Sep 23, 2003 at 18:47 UTC

Any "random" algorithm is broken over a sufficiently large data set. That is the basis behind Chaos Theory. Random events or data are not very random if you take a large enough data set.

I'm not sure I even understand those statements...

Firstly, by saying, "any 'random' algorithm is broken over a sufficiently large data set" are you implying that said algorithm is not broken over a smaller data set? Perhaps you got the phrasing backward and you meant that any such algorithm is broken over a smaller data set. Afterall, in the case of unique IDs, a smaller data set will be more likely to result in a duplicate ID than a large one. In any case, though, it's the reliance on randomness that breaks it, not the size of the data set.

Secondly, saying "random events or data are not very random if you take a large enough data set" doesn't make any sense at all. Randomness has nothing to do with the size of the data set¹. It has everything to do with predictability. If you have a function which randomly returns either 0 or 1 then you can choose numbers between 0 and 2**9876543210 - 1 with no loss or gain of randomness.

Finally, I don't see how any of this has anything to do with Chaos Theory. CT is concerned with deterministic processes where minute (even immeasurably so) differences in initial conditions can result in very different final states. The theory explains how apparent randomness can be observed even in very well-understood determistic systems.

It all boils down to what you consider to be acceptably "broken" and what your exposure is.

My point was that using randomness for generating unique IDs should not be recommended. There are ways to do it that aren't broken. Why concern yourself with using statistical analysis to determine how likely it is your program will fail when you can avoid failure altogether?

1. Well, almost nothing. Nothing for data sets with at least two elements you can choose. In other words, if you can only make one choice, you can't make it "randomly."

-sauoq
"My two cents aren't worth a dime.";

[reply]


Perl: the Markov chain saw
	PerlMonks