http://qs321.pair.com?node_id=396455

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Assuming a table with millions of rows containing an indexed numeric column with a wide range of values that change very frequently, what would be the most efficient way to determine what position or "rank" of any given row has based on the value in the indexed column?

Thanks for your replies.

Replies are listed 'Best First'.
Re: Ranking position in a SQL index
by jZed (Prior) on Oct 05, 2004 at 04:28 UTC
    Assuming the values in the column are unique, this will tell you the the numerical position of a given row based on the column.
    SELECT COUNT($col) FROM $table WHERE $col <= $this_row_col_value

    Or perhaps I misunderstand what you mean by "rank", if so, please explain.

Re: Ranking position in a SQL index
by Zaxo (Archbishop) on Oct 05, 2004 at 03:43 UTC

    It depends on what your dbms supports, but look at the ORDER BY and LIMIT clauses.

    After Compline,
    Zaxo

      I know those clauses but I can't see how they would help me get the numeric rank of a given row or value efficiently.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Ranking position in a SQL index
by sgifford (Prior) on Oct 05, 2004 at 04:41 UTC
    It depends on how your indexes are formed and on your database's optimizer, but something like this might work:
    SELECT COUNT(*) FROM table WHERE numeric_column < (SELECT numeric_column FROM table WHERE your_criteria)

    In my version of mysql, subqueries aren't supported, so I actually have to save the subselect results into a variable, then do the second select, but EXPLAIN says this similar query is reasonably well-optimized:

    mysql> explain select count(*) FROM faar_homes WHERE price < 80000 \G *************************** 1. row *************************** table: faar_homes type: range possible_keys: price key: price key_len: 5 ref: NULL rows: 25 Extra: where used; Using index
      "SELECT COUNT(*) FROM table WHERE numeric_column < (SELECT numeric_column FROM table WHERE your_criteria)"

      I believe you knew, but just to make it complete. The criteria has to make sure that only one row is returned from the subquery. Otherwise you would get error like 'too many rows returned'.

Re: Ranking position in a SQL index
by superfrink (Curate) on Oct 05, 2004 at 04:02 UTC
    You might have to ORDER BY and then loop through all the rows in your application.

    Maybe check out the HAVING keyword. As I recall it is something like adding a WHERE clause after the grouping and ordering but I'm not 100% sure.

      Having is kind of like where, but only works with group by. It allows you to pick up groups that meets the criterias defined in having.

      But having does not give you order, nor does group by. Although most of the DBMS systems actually do sorting for group by, it is not required by SQL standard, so unless it is clearly stated in document, don't assume it does the ordering.

      For example, oracle does not always sort group by, but sometime it sorts. You will see this, when you do explain. Oracle will tell you whether it is 'sorted group by' or just a 'group by'. But this is not controled by you, so in Oracle, you still have to order by after group by, to ensure the ordering is really there. This does not degrade the performance, as Oracle will skip the order by silently, if the group by is 'sorted'.

      Well I think in the worst case I could confine the problem to an O(log N) binary search via the LIMIT clause, but I'm hoping for an O(1) solution.
Re: Ranking position in a SQL index
by Plankton (Vicar) on Oct 05, 2004 at 05:09 UTC
    Are you looking for something like a Oracle's implied column rownum? You know ...
    select what, ever, rownum RANK from something where rownum between 123 + and 321;
    ... my SQL and Oracle are rusty but I hope you find that helpful.

    Plankton: 1% Evil, 99% Hot Gas.
      Actually rownum doesn't work quite like that.

      Tom has a good explanation of what it does. It could be useful in the original poster's case though.

      /J

Re: Ranking position in a SQL index
by TedPride (Priest) on Oct 05, 2004 at 10:10 UTC
    SELECT idn FROM tablename WHERE index = 123 LIMIT 1;

    Don't forget the LIMIT part. It makes the search more efficient. Note also that idn should be a primary key auto-increment, so it returns the row number assuming you don't delete any rows (better to flag rows for delete and then replace with fresh data later on).

    On the other hand, if you're just doing this from a text file, you can set up a secondary file with all the index values and line numbers in order by index, and then perform a binary search to find the line number you want. Then the line number can be used to access the actual data in the other file.

      Just to clarify something here. The 'limit' clause has no effect on the efficiency of the search. It affects the overall efficiency of the query because it prevents you from accidentally requesting more rows than you can handle.

      In the case where there is a unique index on the field (which is essentially what a PK-identity is anyway) the limit is entirely superlfuous and probably actually slows things down just marginally as its more to be marshalled accross the interprocess or intermachine barrier. Also to the best of my knowledge 'limit' is not standard SQL but rather a MySQL extension.

      Anyway. ;-)


      ---
      demerphq

        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi

        Flux8


        It is a MySQL extension, but an damn useful one and it's one of the things I consider when determining whether to use MySQL or Oracle for a given project.

        Oracle does have something similar, but it's harder to do.

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Ranking position in a SQL index
by Anonymous Monk on Oct 06, 2004 at 20:00 UTC
    The most efficient way is dependent on the DBMS, the hardware, the data, the query optimizer, if the code is run as a stored procedure or ad-hoc query, proper indexes, good dB coding, etc. A lot also depends on how often this ranking is needed. Can the query be run on an alternate dB that is used for reporting and isn't constantly being updated? You may want to lock a table and perform the query on a reporting dB and then unlock the table and let the transactions catch up through replication. Is the done once a day, week, month, hour, etc. SQL performance is both a science and an art.

    Here is a way to do it. SQL is a lot like Perl in the TMTOWTDI. The code joins the table on itself which is usually not a very good performer when you have millions of rows. The nice thing about this example is that it's SQL ANSI 92 standard so it should run on all compliant dBs and it gives you the result you want.

    CREATE TABLE Abc(letter char(1), position int) INSERT INTO Abc (letter, position) VALUES ('a', 1) INSERT INTO Abc (letter, position) VALUES ('b', 2) INSERT INTO Abc (letter, position) VALUES ('d', 4) INSERT INTO Abc (letter, position) VALUES ('j', 10) INSERT INTO Abc (letter, position) VALUES ('z', 26) SELECT COUNT(*) AS rank, a1.letter FROM Abc a1 JOIN Abc a2 ON a1.letter > a2.letter GROUP BY a1.letter
      Oops, make that join
      JOIN Abc a2 ON a1.letter >= a2.letter
      otherwise 'a' gets lost. Sorry about that.