Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

OT: peak values with SQL

by revdiablo (Prior)
on Jun 04, 2004 at 23:52 UTC ( [id://361276]=perlquestion: print w/replies, xml ) Need Help??

revdiablo has asked for the wisdom of the Perl Monks concerning the following question:

Sorry for the off-topic question, but maybe someone here will have some ideas for me. I have a table with uptime values, updated once a day. I'd like to know how high the uptime was before each reboot, so I'd like the peak uptime values. This is easily accomplished with some perl code like the following:

my @uptimes = qw(1 2 3 4 5 1 2 3 4 1 2 3 4 5 6); for (0 .. $#uptimes) { print "We have a peak value: $uptimes[$_]\n" if $_ == $#uptimes or $_ < $#uptimes and $uptimes[$_] > $uptimes[$_+1]; }

I'd like to do this all in SQL, however. It's less a practical requirement than idle curiosity, but maybe I'll end up learning something useful. Here's one approach I thought might work:

SELECT uptime_id as uid, uptime_value FROM uptime WHERE uptime_value > (SELECT uptime_value FROM uptime WHERE uptime_id = uid+1);

But there are some problems with this:

  1. The uid is not available in the subselect, and as such I get an error: ERROR:  column "uid" does not exist
  2. I don't know if uid+1 would do what I expected, even if uid was found.

For your information, I'm using PostgreSQL. Many of its features are compatible with Oracle, so if you have any ideas that work with Oracle, they might with PostgreSQL.

Any ideas are appreciated.

Update: thanks to Enlil for the suggestion, but the following won't quite do it either:

SELECT MAX(uptime_value) FROM uptime;

This selects the highest uptime, but I want all the peaks, not just the highest one.

Replies are listed 'Best First'.
Re: OT: peak values with SQL
by runrig (Abbot) on Jun 05, 2004 at 00:08 UTC
    Use table aliases when using the same table in a correlated sub-query:
    SELECT u1.uptime_id as uid, u1.uptime_value FROM uptime u1 WHERE u1.uptime_value > (SELECT u2.uptime_value FROM uptime u2 WHERE u2.uptime_id = u1.uptime_id+1);

      Excellent, works like a charm. Many thanks.

Re: OT: peak values with SQL
by eclark (Scribe) on Jun 05, 2004 at 01:06 UTC

    Straying a little bit, but I think this is a good example to show how the database works differently with joins verses subqueries.

    Here's runrig's query plan on my version of postgres.

    EXPLAIN ANALYZE SELECT u1.uptime_id as uid, u1.uptime_value FROM uptime u1 WHERE u1.uptime_value > (SELECT u2.uptime_value FROM uptime u2 WHERE u2.uptime_id = u1.uptime_id+1); QUERY PLAN + + ---------------------------------------------------------------------- +--------------------------------------- Seq Scan on uptime u1 (cost=0.00..25022.50 rows=334 width=8) (actual + time=0.207..0.614 rows=3 loops=1) Filter: (uptime_value > (subplan)) SubPlan -> Seq Scan on uptime u2 (cost=0.00..25.00 rows=6 width=4) (act +ual time=0.015..0.026 rows=1 loops=16) Filter: (uptime_id = ($0 + 1)) Total runtime: 0.688 ms

    Here's my query plan using join.

    EXPLAIN ANALYZE SELECT u1.uptime_id as uid, u1.uptime_value FROM uptime u1, uptime u2 WHERE u2.uptime_id = u1.uptime_id+1 AND u1.uptime_value > u2.uptime_value; QUERY PLAN + + ---------------------------------------------------------------------- +---------------------------------------------- Merge Join (cost=139.66..247.18 rows=1667 width=8) (actual time=0.45 +3..0.654 rows=3 loops=1) Merge Cond: ("outer"."?column3?" = "inner".uptime_id) Join Filter: ("outer".uptime_value > "inner".uptime_value) -> Sort (cost=69.83..72.33 rows=1000 width=8) (actual time=0.193. +.0.250 rows=16 loops=1) Sort Key: (u1.uptime_id + 1) -> Seq Scan on uptime u1 (cost=0.00..20.00 rows=1000 width= +8) (actual time=0.020..0.102 rows=16 loops=1) -> Sort (cost=69.83..72.33 rows=1000 width=8) (actual time=0.159. +.0.221 rows=16 loops=1) Sort Key: u2.uptime_id -> Seq Scan on uptime u2 (cost=0.00..20.00 rows=1000 width= +8) (actual time=0.005..0.076 rows=16 loops=1) Total runtime: 0.765 ms

    In this case the subquery is faster, but you'll notice its getting 1 row at a time for 16 loops. The joined query should be a lot faster as the data set grows larger.

    I just tested this with a total of 131072 rows. The joined query took a total of 3872.267 ms. I am still waiting for the subquery to return.

    Update: The select with subquery finally returned, 19074438.347 ms

      Do you have an index on uptime_id (I suspect not, since your analyze output doesn't show it being used)? The subquery shouldn't be all that bad, unless there's no index. The main query should use a sequential scan, but the sub-query should be using an index, if it's a decent query optimizer. I agree the joined query is better anyway, and without an index, it's at least able to use a merge join, but that's still worse than if it could use an index for the join.

      Very nice explanation. I figured the subquery would not be ideal for large datasets, but this is a cool way to show why.

      eclark++

        I would create an index on uptime_id and re-run the comparison.
Re: OT: peak values with SQL
by kragen (Sexton) on Jun 05, 2004 at 00:30 UTC
    Hmm.
    select p.uptime_value from uptime as p, uptime as n where p.uptime_id + 1 = n.uptime_id and p.uptime_value > n.uptime_value
    Perhaps? (update: reformatted for legibility)
Re: OT: peak values with SQL
by jZed (Prior) on Jun 05, 2004 at 01:09 UTC
    You might want to consider the last value as a peak. What if your last three values were 6,0,999? 999 wouldn't get counted as a peak because it's not higher than uptime_id +1 because it's the last uptime_id. You probably want to grab the max_id in a separate SQL call, but if you wanted to cram it all into one, you could do this:
    SELECT uptime_id, uptime_value
      FROM uptime AS t1
     WHERE uptime_value >( SELECT uptime_value
                             FROM uptime
                            WHERE uptime_id =  t1.uptime_id + 1
                          )
        OR uptime_id    = ( SELECT MAX(uptime_id) FROM uptime )
    
    update nonsense example omitted
    update 2 removed the strikeouts from the example, on second thought it isn't nonsense, it's a valid concern
•Re: OT: peak values with SQL
by merlyn (Sage) on Jun 05, 2004 at 13:12 UTC
      Because your query returns only a single peak value, rather than a list of all peak values as the OP requested.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://361276]
Approved by Plankton
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-04-25 12:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found