http://qs321.pair.com?node_id=361304


in reply to OT: peak values with SQL

Straying a little bit, but I think this is a good example to show how the database works differently with joins verses subqueries.

Here's runrig's query plan on my version of postgres.

EXPLAIN ANALYZE SELECT u1.uptime_id as uid, u1.uptime_value FROM uptime u1 WHERE u1.uptime_value > (SELECT u2.uptime_value FROM uptime u2 WHERE u2.uptime_id = u1.uptime_id+1); QUERY PLAN + + ---------------------------------------------------------------------- +--------------------------------------- Seq Scan on uptime u1 (cost=0.00..25022.50 rows=334 width=8) (actual + time=0.207..0.614 rows=3 loops=1) Filter: (uptime_value > (subplan)) SubPlan -> Seq Scan on uptime u2 (cost=0.00..25.00 rows=6 width=4) (act +ual time=0.015..0.026 rows=1 loops=16) Filter: (uptime_id = ($0 + 1)) Total runtime: 0.688 ms

Here's my query plan using join.

EXPLAIN ANALYZE SELECT u1.uptime_id as uid, u1.uptime_value FROM uptime u1, uptime u2 WHERE u2.uptime_id = u1.uptime_id+1 AND u1.uptime_value > u2.uptime_value; QUERY PLAN + + ---------------------------------------------------------------------- +---------------------------------------------- Merge Join (cost=139.66..247.18 rows=1667 width=8) (actual time=0.45 +3..0.654 rows=3 loops=1) Merge Cond: ("outer"."?column3?" = "inner".uptime_id) Join Filter: ("outer".uptime_value > "inner".uptime_value) -> Sort (cost=69.83..72.33 rows=1000 width=8) (actual time=0.193. +.0.250 rows=16 loops=1) Sort Key: (u1.uptime_id + 1) -> Seq Scan on uptime u1 (cost=0.00..20.00 rows=1000 width= +8) (actual time=0.020..0.102 rows=16 loops=1) -> Sort (cost=69.83..72.33 rows=1000 width=8) (actual time=0.159. +.0.221 rows=16 loops=1) Sort Key: u2.uptime_id -> Seq Scan on uptime u2 (cost=0.00..20.00 rows=1000 width= +8) (actual time=0.005..0.076 rows=16 loops=1) Total runtime: 0.765 ms

In this case the subquery is faster, but you'll notice its getting 1 row at a time for 16 loops. The joined query should be a lot faster as the data set grows larger.

I just tested this with a total of 131072 rows. The joined query took a total of 3872.267 ms. I am still waiting for the subquery to return.

Update: The select with subquery finally returned, 19074438.347 ms

Replies are listed 'Best First'.
Re^2: OT: peak values with SQL
by runrig (Abbot) on Jun 06, 2004 at 18:39 UTC
    Do you have an index on uptime_id (I suspect not, since your analyze output doesn't show it being used)? The subquery shouldn't be all that bad, unless there's no index. The main query should use a sequential scan, but the sub-query should be using an index, if it's a decent query optimizer. I agree the joined query is better anyway, and without an index, it's at least able to use a merge join, but that's still worse than if it could use an index for the join.
Re^2: OT: peak values with SQL
by revdiablo (Prior) on Jun 05, 2004 at 01:28 UTC

    Very nice explanation. I figured the subquery would not be ideal for large datasets, but this is a cool way to show why.

    eclark++

      I would create an index on uptime_id and re-run the comparison.