Random number

Moshambo has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

RE: Random number (CGI Security)
by Russ (Deacon) on Jul 10, 2000 at 00:58 UTC

Let me give you a spectrum of examples. Suppose you are building a shopping cart website, and you need to assign a number to each customer who visits. Here are some possibilities (I will discuss the ramifications below):

Since/If you are using a database (or some other target for DBI), you could just allow an autoincrementing ID to provide your CustomerID.
You could use a self-built "random" number (like a combination of time + IP address (like Netscape did...OOOPS!))
You could use some variant of the customer's name or other information for the unique identifier.
You could use a pseudo-random number, like Perl's built-in rand()
You could use a "real" random number generator, like Math::TrulyRandom or /dev/random on most UNIX boxes.

The problem with most of these solutions (in descending order of "wrongness") is that they are easy for a cracker/malicious individual to find, guess or generate.

An incrementing ID is obviously easy to figure out. Make an account, look at your ID, and try the ID below yours.
Self-built "random" numbers are also pretty easy to figure out. Knowing what time it is, and watching where someone is coming from will give the black hat a very small number of possibilities to try.
Letting the customer pick the key (either voluntarily, or generated from other information) may be easy to guess, especially when the malcontent can simply look at how you generated his key.
Pseudo-random numbers are much harder to figure out, but the very definition of pseudo-random numbers means that, given a few numbers generated from the same source, you can (relatively) easily know ALL of the numbers which will follow.
"Real" random number generators (like Math::TrulyRandom, /dev/random, etc.) get their numbers from a non-deterministic source (like interrupt timing discrepancies). This (theoretically) does not cause any predictability in the output.

Now, the second part of the equation is: "What data are you trying to protect?"

If you are just going to display a random picture (like the "Monk icons" at the upper right of your screen), security is not your concern. Therefore, make the most efficient use of your time and use rand(). If a cracker guesses that the next icon to appear will be vroom's, who cares?

If, however, you will be storing a customer's personal information (especially credit-card numbers) and allowing the user to view that information later...You would be shamefully negligent to use anything less than 128-bit or greater SSL, a truly random number for the CustomerID, strong, cryptographic-quality passwords... (and perhaps even that is not enough).

Here is an example from one of my projects. We are building an e-commerce site which will allow users to order products, entering credit card information for payment. Users may upload graphics to use in the printed product. We have chosen not to allow the user to retrieve credit card data. They may view and edit their uploaded logos.

For SessionIDs and CustomerIDs, we use truly random numbers. Because we do not store intensely sensitive data, we do not need to enforce strict, cryptographic-quality passwords. A Customer's work is important, so we use "real" random numbers to protect the Session. Images (uploaded logos) use auto-incrementing IDs, since they will be hidden behind the CustomerID (and/or SessionID). Customer's logos are their property, so we protect them with the random Customer key, but because logos are (presumably) already publicly available, we do not need the highest level of security for them. When ordering, we transmit credit-card information over SSL and protect the card info appropriately (e.g. NEVER send it via e-mail!), and do not allow a user to see that information again (so we do not have to inflict a random password and other sufficiently paranoid measures upon the hapless visitor). Order confirmation, which uses no sensitive data at all, happens with security-free (what other kind is there?) e-mail.

Security is an ever-present concern in e-commerce. The heart of data security is cryptography. The heart of cryptography is random number generation. The weaker the random numbers, the weaker the cryptography; therefore, the weaker the security. Random numbers having anything to do with security must be the highest-possible quality. Your advice to avoid rand() in CGI is a direct reflection of this security mindset. If you need a random number to keep people out of places where they do not belong, you need the best random number you can get. rand() is not it.

Russ

[reply]

Re: Random number
by httptech (Chaplain) on Jul 09, 2000 at 17:02 UTC

Math::TrulyRandom

Time::HiRes

[reply]

RE: Random number
by BBQ (Curate) on Jul 09, 2000 at 19:42 UTC

Moshambo

rand

Regardless of the language used to generate it.

#!/home/bbq/bin/perl
# Trust no1!

[reply]

RE: RE: Random number

by PipTigger (Hermit) on Jul 10, 2000 at 02:09 UTC

But, if you are trying to keep track of data, or generating unique strings to insert into a database, I would never use randomness. Regardless of the language used to generate it.

Could you please elaborate as to why not and what alternatives there are? Thanks. TTFN.

-PipTigger

p.s. A Tale of Soul and Sword Eternally Re-told!

[reply]

RE(3): Random number - A very long reply...

by BBQ (Curate) on Jul 10, 2000 at 07:07 UTC

As a matter of fact, I can, but I should warn you this can turn into an essay very quickly. :)

Uniqueness of data
There are two good reasons why one should never rely on randomness to keep track of data: luck and track.

Lets take luck first, just think of the lottery. If you play games of fortune, you count on good fortune to keep you on the positive, winning, et al. When you rely on luck to generate unique strings, or cryptic information you rely on the same luck, only directed in the opposite direction.

It would be like playing the slots in Las Vegas hoping that you never get a triple seven, or consecutive bells, or whatever it is that slots reward you with. You are counting on being rewarded with the lack of matches. If your application deals with sensitive data, luck should never be a factor to consider. After all, there's as much good luck as there is bad luck in this world (only Murphy would find an algorithm to prove that it can be worse).

The second reason, track, is actually more obvious. Its just very hard to keep track of something if you are being random about it! It would be like counting cars, except instead of numbering them, you could (off of the top of my head) interview people on the street asking them what their favorite TV show is instead. You would come out with results like:

South Park	VW Bug
Pinky & The Brain	Porsche 911
X-Files	72' Land Rover
The Daily Show	Cowboy Neal

I've actually heard of people that use this sort of technique to memorize data for long periods of time, but for storing them in a database, it really doesn't seem to be very effective. (And on a side note, that isn't being random either) If you have hundreds of thousands of records, I bet you you'll start getting duplicate favorite shows, and even if you didn't, it would be hard as hell to tell what the car you had counted was in the 1st place.

Track of Data
I have (in contrast) two methods for keeping track of my data, and neither of them are the best there are, but they have been useful nonetheless. The first, and I believe most used method is by auto-incrementing an ID field. Defining it with a unique constraint in a database, and then auto-incrementing it as you add more info. This is pretty obvious, but just for the sake of it, lets say:

1	VW Bug
2	Porsche 911
3	72' Land Rover
4	Cowboy Neal

The second method, which I use most frequently is a combination of time and process ID. The combination of both will give me unique data and two bits of information that are much more useful than the order of which they were entered into the database. Consider that the string being generated is "$^T$$". Every time we generate a new record, we have the epoch ($^T) and the current PID ($$) of when that record was created. Even if you have multiple records coming into the database, they can't be running under the same process ID, and therefor must be unique. (I have yet to see a machine that can spawn that many processes per second). And for examples sake, our table would look something like this (under my Win NT box):

963198426-505	VW Bug
963198679-503	Porsche 911
963198688-505	72' Land Rover
963198703-500	Cowboy Neal

Conclusion
If I had to sum it up, I'd just say, "Don't let fate take over your application. Fate can be good, but if there is one thing that you can count on, its that Murphy will make it bad." or as my father (a math freak) puts it, "Nothing is truly random, and there is no such thing as a perfect circle".

#!/home/bbq/bin/perl
# Trust no1!

[reply]

RE: RE(3): Random number - A very long reply...

by PipTigger (Hermit) on Jul 10, 2000 at 09:23 UTC

RE: RE: RE(3): Random number - A very long reply...

by BBQ (Curate) on Jul 10, 2000 at 10:03 UTC

RE: Random number
by Anonymous Monk on Jul 10, 2000 at 01:20 UTC

my @Chars = ( "A" .. "Z", "a" .. "z", 0 .. 9);
my $RandString = join("", @Chars[ map { rand @Chars } ( 1 .. 30 ) ]);
[download]

[reply]
[d/l]

RE: RE: Random number

by greenhorn (Sexton) on Jul 10, 2000 at 11:53 UTC

Regarding:

my $RandString = join("", @Chars[ map { rand @Chars } ( 1 .. 30 ) ]);

I'm wondering if I may inflict a question upon you--details about
that statement. I understand the purposes of "components" within it,
and I can certainly see the results when I run the script. But just
how it's generating those results is not clear to me. Could I talk you
into explaining a bit about how it works?

Thanks.

[reply]
[d/l]

Random Array Slice

by chromatic (Archbishop) on Jul 10, 2000 at 21:45 UTC

rand

The map statement creates a 30 element list of those random numbers.

That list is used as indexes into an array slice. (That means that the second argument to the join statement is a random set of 30 characters from the @Chars array.) They're joined into a string.

[reply]

RE: Random number
by gronkulator (Sexton) on Jul 10, 2000 at 17:36 UTC

my $randbits="";
open(URANDOM, "/dev/urandom") or die "Phooey: $!";
read(URANDOM, $randbits, 2);
close(URANDOM);
$rand=unpack("S*", $randbits);
printf("random: %d\n", $rand);

[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks