Re: OT: peak values with SQL

in reply to OT: peak values with SQL

Straying a little bit, but I think this is a good example to show how the database works differently with joins verses subqueries.

Here's runrig's query plan on my version of postgres.

EXPLAIN ANALYZE
SELECT u1.uptime_id as uid, u1.uptime_value
  FROM uptime u1
 WHERE u1.uptime_value > (SELECT u2.uptime_value
                          FROM uptime u2
                          WHERE u2.uptime_id = u1.uptime_id+1);

                                                 QUERY PLAN           
+                                                                     
+ 
----------------------------------------------------------------------
+---------------------------------------
 Seq Scan on uptime u1  (cost=0.00..25022.50 rows=334 width=8) (actual
+ time=0.207..0.614 rows=3 loops=1)
   Filter: (uptime_value > (subplan))
   SubPlan
     ->  Seq Scan on uptime u2  (cost=0.00..25.00 rows=6 width=4) (act
+ual time=0.015..0.026 rows=1 loops=16)
           Filter: (uptime_id = ($0 + 1))
 Total runtime: 0.688 ms
[download]

Here's my query plan using join.

EXPLAIN ANALYZE
SELECT u1.uptime_id as uid, u1.uptime_value
  FROM uptime u1, uptime u2
 WHERE u2.uptime_id = u1.uptime_id+1
   AND u1.uptime_value > u2.uptime_value;

                                                     QUERY PLAN       
+                                                                     
+     
----------------------------------------------------------------------
+----------------------------------------------
 Merge Join  (cost=139.66..247.18 rows=1667 width=8) (actual time=0.45
+3..0.654 rows=3 loops=1)
   Merge Cond: ("outer"."?column3?" = "inner".uptime_id)
   Join Filter: ("outer".uptime_value > "inner".uptime_value)
   ->  Sort  (cost=69.83..72.33 rows=1000 width=8) (actual time=0.193.
+.0.250 rows=16 loops=1)
         Sort Key: (u1.uptime_id + 1)
         ->  Seq Scan on uptime u1  (cost=0.00..20.00 rows=1000 width=
+8) (actual time=0.020..0.102 rows=16 loops=1)
   ->  Sort  (cost=69.83..72.33 rows=1000 width=8) (actual time=0.159.
+.0.221 rows=16 loops=1)
         Sort Key: u2.uptime_id
         ->  Seq Scan on uptime u2  (cost=0.00..20.00 rows=1000 width=
+8) (actual time=0.005..0.076 rows=16 loops=1)
 Total runtime: 0.765 ms
[download]

In this case the subquery is faster, but you'll notice its getting 1 row at a time for 16 loops. The joined query should be a lot faster as the data set grows larger.

I just tested this with a total of 131072 rows. The joined query took a total of 3872.267 ms. I am still waiting for the subquery to return.

Update: The select with subquery finally returned, 19074438.347 ms

Comment on Re: OT: peak values with SQL Select or Download Code

Replies are listed 'Best First'.
Re^2: OT: peak values with SQL by runrig (Abbot) on Jun 06, 2004 at 18:39 UTC
Do you have an index on uptime_id (I suspect not, since your analyze output doesn't show it being used)? The subquery shouldn't be all that bad, unless there's no index. The main query should use a sequential scan, but the sub-query should be using an index, if it's a decent query optimizer. I agree the joined query is better anyway, and without an index, it's at least able to use a merge join, but that's still worse than if it could use an index for the join.	[reply]
Re^2: OT: peak values with SQL by revdiablo (Prior) on Jun 05, 2004 at 01:28 UTC
Very nice explanation. I figured the subquery would not be ideal for large datasets, but this is a cool way to show why. eclark++	[reply]
Re^3: OT: peak values with SQL by runrig (Abbot) on Jun 07, 2004 at 23:08 UTC
I would create an index on uptime_id and re-run the comparison.	[reply]

In Section Seekers of Perl Wisdom