comment on

I used the third technique in a recent project, but instead of using the primary key (which was a non-sequential ID number), I added a rownum column which gave each row a sequential number. I then selected a random value between 1 and the number of rows.

The reason this is much faster than the ORDER BY rand() technique is that (AFAIK) in order to resolve ORDER BY rand(), MySQL has to go through each row, generate a random number, then see which one is the lowest. If the table is very large, that takes a long time.

The problem with using a row number is that if the data changes, you can end up gaps with in numbering. If the data is generated periodically, there's no problem; just regenerate the row numbers with the rest of the data. Periodically, you can regenerate the row numbers to fill in gaps. If there's just a few missing numbers scattered here and there and you don't need great randomness, it's no big deal; instead of using WHERE id = $rand you'd use WHERE id >= $rand LIMIT 1. If you have a lot of changes, though, this technique will work poorly.

Something that might work a little better with a large number of changes would be to randomly assign each record a number between, say, 1 and 100. To choose a random record, then, you would first pick a number between 1 and 100, then you would get a record randomly from the set of rows assigned that number with ORDER BY rand(). It would still have to evaluate multiple records, but on average it would only be 1/100 as many records as using ORDER BY rand() on the whole table.

In reply to Re: Re: Is there a good way of retrieving entries randomly from a database (MySQL perhaps)? by sgifford
in thread Is there a good way of retrieving entries randomly from a database (MySQL perhaps)? by soon_j

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Think about Loose Coupling
	PerlMonks