OT: peak values with SQL

revdiablo has asked for the wisdom of the Perl Monks concerning the following question:

Sorry for the off-topic question, but maybe someone here will have some ideas for me. I have a table with uptime values, updated once a day. I'd like to know how high the uptime was before each reboot, so I'd like the peak uptime values. This is easily accomplished with some perl code like the following:

my @uptimes = qw(1 2 3 4 5 1 2 3 4 1 2 3 4 5 6);

for (0 .. $#uptimes) {
  print "We have a peak value: $uptimes[$_]\n"
    if  $_           == $#uptimes
    or  $_           <  $#uptimes 
    and $uptimes[$_] >  $uptimes[$_+1];
}
[download]

I'd like to do this all in SQL, however. It's less a practical requirement than idle curiosity, but maybe I'll end up learning something useful. Here's one approach I thought might work:

SELECT uptime_id as uid, uptime_value
  FROM uptime
 WHERE uptime_value > (SELECT uptime_value
                         FROM uptime
                        WHERE uptime_id = uid+1);
[download]

But there are some problems with this:

The uid is not available in the subselect, and as such I get an error: ERROR: column "uid" does not exist
I don't know if uid+1 would do what I expected, even if uid was found.

For your information, I'm using PostgreSQL. Many of its features are compatible with Oracle, so if you have any ideas that work with Oracle, they might with PostgreSQL.

Any ideas are appreciated.

Update: thanks to Enlil for the suggestion, but the following won't quite do it either:

SELECT MAX(uptime_value) FROM uptime;
[download]

This selects the highest uptime, but I want all the peaks, not just the highest one.

Comment on OT: peak values with SQL Select or Download Code

Replies are listed 'Best First'.
Re: OT: peak values with SQL by runrig (Abbot) on Jun 05, 2004 at 00:08 UTC
Use table aliases when using the same table in a correlated sub-query: `SELECT u1.uptime_id as uid, u1.uptime_value FROM uptime u1 WHERE u1.uptime_value > (SELECT u2.uptime_value FROM uptime u2 WHERE u2.uptime_id = u1.uptime_id+1);` [download]	[reply] [d/l]
Re^2: OT: peak values with SQL by revdiablo (Prior) on Jun 05, 2004 at 00:18 UTC
Excellent, works like a charm. Many thanks.	[reply]
Re: OT: peak values with SQL by eclark (Scribe) on Jun 05, 2004 at 01:06 UTC
Straying a little bit, but I think this is a good example to show how the database works differently with joins verses subqueries. Here's runrig's query plan on my version of postgres. EXPLAIN ANALYZE SELECT u1.uptime_id as uid, u1.uptime_value FROM uptime u1 WHERE u1.uptime_value > (SELECT u2.uptime_value FROM uptime u2 WHERE u2.uptime_id = u1.uptime_id+1); QUERY PLAN + + ---------------------------------------------------------------------- +--------------------------------------- Seq Scan on uptime u1 (cost=0.00..25022.50 rows=334 width=8) (actual + time=0.207..0.614 rows=3 loops=1) Filter: (uptime_value > (subplan)) SubPlan -> Seq Scan on uptime u2 (cost=0.00..25.00 rows=6 width=4) (act +ual time=0.015..0.026 rows=1 loops=16) Filter: (uptime_id = ($0 + 1)) Total runtime: 0.688 ms [download] Here's my query plan using join. EXPLAIN ANALYZE SELECT u1.uptime_id as uid, u1.uptime_value FROM uptime u1, uptime u2 WHERE u2.uptime_id = u1.uptime_id+1 AND u1.uptime_value > u2.uptime_value; QUERY PLAN + + ---------------------------------------------------------------------- +---------------------------------------------- Merge Join (cost=139.66..247.18 rows=1667 width=8) (actual time=0.45 +3..0.654 rows=3 loops=1) Merge Cond: ("outer"."?column3?" = "inner".uptime_id) Join Filter: ("outer".uptime_value > "inner".uptime_value) -> Sort (cost=69.83..72.33 rows=1000 width=8) (actual time=0.193. +.0.250 rows=16 loops=1) Sort Key: (u1.uptime_id + 1) -> Seq Scan on uptime u1 (cost=0.00..20.00 rows=1000 width= +8) (actual time=0.020..0.102 rows=16 loops=1) -> Sort (cost=69.83..72.33 rows=1000 width=8) (actual time=0.159. +.0.221 rows=16 loops=1) Sort Key: u2.uptime_id -> Seq Scan on uptime u2 (cost=0.00..20.00 rows=1000 width= +8) (actual time=0.005..0.076 rows=16 loops=1) Total runtime: 0.765 ms [download] In this case the subquery is faster, but you'll notice its getting 1 row at a time for 16 loops. The joined query should be a lot faster as the data set grows larger. I just tested this with a total of 131072 rows. The joined query took a total of 3872.267 ms. I am still waiting for the subquery to return. Update: The select with subquery finally returned, 19074438.347 ms	[reply] [d/l] [select]
Re^2: OT: peak values with SQL by runrig (Abbot) on Jun 06, 2004 at 18:39 UTC
Do you have an index on uptime_id (I suspect not, since your analyze output doesn't show it being used)? The subquery shouldn't be all that bad, unless there's no index. The main query should use a sequential scan, but the sub-query should be using an index, if it's a decent query optimizer. I agree the joined query is better anyway, and without an index, it's at least able to use a merge join, but that's still worse than if it could use an index for the join.	[reply]
Re^2: OT: peak values with SQL by revdiablo (Prior) on Jun 05, 2004 at 01:28 UTC
Very nice explanation. I figured the subquery would not be ideal for large datasets, but this is a cool way to show why. eclark++	[reply]
Re^3: OT: peak values with SQL by runrig (Abbot) on Jun 07, 2004 at 23:08 UTC
I would create an index on uptime_id and re-run the comparison.	[reply]
Re: OT: peak values with SQL by kragen (Sexton) on Jun 05, 2004 at 00:30 UTC
Hmm. `select p.uptime_value from uptime as p, uptime as n where p.uptime_id + 1 = n.uptime_id and p.uptime_value > n.uptime_value` [download] Perhaps? (update: reformatted for legibility)	[reply] [d/l]
Re: OT: peak values with SQL by jZed (Prior) on Jun 05, 2004 at 01:09 UTC
You might want to consider the last value as a peak. What if your last three values were 6,0,999? 999 wouldn't get counted as a peak because it's not higher than uptime_id +1 because it's the last uptime_id. You probably want to grab the max_id in a separate SQL call, but if you wanted to cram it all into one, you could do this: SELECT uptime_id, uptime_value FROM uptime AS t1 WHERE uptime_value >( SELECT uptime_value FROM uptime WHERE uptime_id = t1.uptime_id + 1 ) OR uptime_id = ( SELECT MAX(uptime_id) FROM uptime ) update nonsense example omitted update 2 removed the strikeouts from the example, on second thought it isn't nonsense, it's a valid concern	[reply]
�Re: OT: peak values with SQL by merlyn (Sage) on Jun 05, 2004 at 13:12 UTC
Seems like you're going at it the hard way. Why not just: `SELECT uptime_id AS uid, uptime_value FROM uptime WHERE uptime_value = (SELECT MAX(uptime_value) FROM uptime);` [download] -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l]
Re: �Re: OT: peak values with SQL by jZed (Prior) on Jun 05, 2004 at 17:00 UTC
Because your query returns only a single peak value, rather than a list of all peak values as the OP requested.	[reply]
�Re^2: �Re: OT: peak values with SQL by merlyn (Sage) on Jun 05, 2004 at 22:36 UTC
I suppose there's a difference between "all the peak values" and "all the values that are peak" that isn't discernable within a half hour of waking. Sorry. {grin} -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]


go ahead... be a heretic
	PerlMonks