PostgreSQL query performance and possible optimisation

Question 1

I managed to write my query to get me correct data but to me it looks pretty bad since i had to use query inside query 3 times and even doe query performance is fine now around 700msec i am afraid it will slow down in future when there will be more data to process. Any info on how bad is this and how can i optimize it would be appreciated.

Edit:

I forgot to mention that tables s3 and s14 have multiple rows with same parcelno and i always need newest row from both tables(which is determined by sdate and stime). If newest row from s14 is newer than newest from s3, or newest row from s3 column emadr2 has same value as column parcelshop_id from table d, data is not shown. Keep in mind that these tables where not created by me and i am only reading data from them.

SELECT 
 q1.ddepot, 
 q1.parcelno, 
 q1.sdate, 
 q1.stime, 
 q1.dpostal, 
 q1.service, 
 q1.lorry,
 q1.zc5x3,
 q1.parcelshop_id,
 q1.country,
 q1.dname1
FROM(
 SELECT DISTINCT ON (q.parcelno) q.* FROM(
 SELECT 
 d.ddepot, 
 d.parcelno, 
 s3.sdate, 
 s3.stime, 
 d.dpostal, 
 d.service, 
 s3.lorry,
 s3.zc5x3,
 d.parcelshop_id,
 s3.country,
 d.dname1,
 s3.emadr1,
 s3.emadr2
 FROM dispatcher.detour_avis d
 LEFT JOIN scans.scandata03 s3 ON d.parcelno = s3.parcelno
 LEFT JOIN scans.scandata14 s14 ON d.parcelno = s14.parcelno 
 WHERE 
 d.ddate > (NOW() - interval '5 day') 
 AND d.parcelshop_id IS NOT NULL 
 AND s3.parcelno IS NOT NULL 
 AND (s14.parcelno IS NULL OR (s14.sdate + s14.stime)::timestamp without time zone < (s3.sdate + s3.stime)::timestamp without time zone)
 ORDER BY s3.sdate, s3.stime DESC
 )q 
 ORDER BY q.parcelno
) q1
WHERE q1.parcelshop_id != q1.emadr2

explain (analyze, verbose):

Subquery Scan on q1 (cost=68552.93..68554.90 rows=84 width=68) (actual time=701.318..701.324 rows=4 loops=1)
 Output: q1.ddepot, q1.parcelno, q1.sdate, q1.stime, q1.dpostal, q1.service, q1.lorry, q1.zc5x3, q1.parcelshop_id, q1.country, q1.dname1
 Filter: ((q1.parcelshop_id)::text <> (q1.emadr2)::text)
 Rows Removed by Filter: 2
 -> Unique (cost=68552.93..68553.85 rows=84 width=87) (actual time=701.310..701.314 rows=6 loops=1)
 Output: d.ddepot, d.parcelno, s3.sdate, s3.stime, d.dpostal, d.service, s3.lorry, s3.zc5x3, d.parcelshop_id, s3.country, d.dname1, s3.emadr1, s3.emadr2
 -> Sort (cost=68552.93..68553.39 rows=184 width=87) (actual time=701.309..701.311 rows=15 loops=1)
 Output: d.ddepot, d.parcelno, s3.sdate, s3.stime, d.dpostal, d.service, s3.lorry, s3.zc5x3, d.parcelshop_id, s3.country, d.dname1, s3.emadr1, s3.emadr2
 Sort Key: d.parcelno
 Sort Method: quicksort Memory: 27kB
 -> Sort (cost=68543.71..68544.17 rows=184 width=87) (actual time=701.269..701.269 rows=15 loops=1)
 Output: d.ddepot, d.parcelno, s3.sdate, s3.stime, d.dpostal, d.service, s3.lorry, s3.zc5x3, d.parcelshop_id, s3.country, d.dname1, s3.emadr1, s3.emadr2
 Sort Key: s3.sdate, s3.stime
 Sort Method: quicksort Memory: 27kB
 -> Nested Loop (cost=0.00..68536.79 rows=184 width=87) (actual time=689.775..701.238 rows=15 loops=1)
 Output: d.ddepot, d.parcelno, s3.sdate, s3.stime, d.dpostal, d.service, s3.lorry, s3.zc5x3, d.parcelshop_id, s3.country, d.dname1, s3.emadr1, s3.emadr2
 Join Filter: ((s14.parcelno IS NULL) OR ((s14.sdate + s14.stime) < (s3.sdate + s3.stime)))
 Rows Removed by Join Filter: 16
 -> Nested Loop Left Join (cost=0.00..57423.07 rows=455 width=74) (actual time=689.615..700.578 rows=14 loops=1)
 Output: d.ddepot, d.parcelno, d.dpostal, d.service, d.parcelshop_id, d.dname1, s14.parcelno, s14.sdate, s14.stime
 -> Seq Scan on dispatcher.detour_avis d (cost=0.00..49247.17 rows=455 width=47) (actual time=689.535..700.162 rows=11 loops=1)
 Output: d.id, d.parcelno, d.service, d.detour_type, d.ddepot, d.dname1, d.dname2, d.dstreet, d.dhouseno, d.dcountryn, d.dstate, d.dpostal, d.dcity, d.dphone, d.odepot, d.oname1, d.oname2, d.ostreet, d.ohouseno, d.ocoun (...)
 Filter: ((d.parcelshop_id IS NOT NULL) AND (d.ddate > (now() - '5 days'::interval)))
 Rows Removed by Filter: 985930
 -> Append (cost=0.00..17.92 rows=5 width=33) (actual time=0.036..0.036 rows=1 loops=11)
 -> Seq Scan on scans.scandata14 s14 (cost=0.00..0.00 rows=1 width=58) (actual time=0.000..0.000 rows=0 loops=11)
 Output: s14.parcelno, s14.sdate, s14.stime
 Filter: ((d.parcelno)::text = (s14.parcelno)::text)
 -> Index Scan using scandata14_2013_pl_indx on scans.scandata14_2013 s14_1 (cost=0.14..0.25 rows=1 width=27) (actual time=0.001..0.001 rows=0 loops=11)
 Output: s14_1.parcelno, s14_1.sdate, s14_1.stime
 Index Cond: ((d.parcelno)::text = (s14_1.parcelno)::text)
 -> Index Scan using scandata14_2014_pl_indx on scans.scandata14_2014 s14_2 (cost=0.29..4.29 rows=1 width=27) (actual time=0.007..0.007 rows=0 loops=11)
 Output: s14_2.parcelno, s14_2.sdate, s14_2.stime
 Index Cond: ((d.parcelno)::text = (s14_2.parcelno)::text)
 -> Index Scan using scandata14_2015_pl_indx on scans.scandata14_2015 s14_3 (cost=0.42..6.47 rows=1 width=27) (actual time=0.010..0.010 rows=0 loops=11)
 Output: s14_3.parcelno, s14_3.sdate, s14_3.stime
 Index Cond: ((d.parcelno)::text = (s14_3.parcelno)::text)
 -> Index Scan using scandata14_2016_pl_indx on scans.scandata14_2016 s14_4 (cost=0.42..6.91 rows=1 width=27) (actual time=0.014..0.015 rows=1 loops=11)
 Output: s14_4.parcelno, s14_4.sdate, s14_4.stime
 Index Cond: ((d.parcelno)::text = (s14_4.parcelno)::text)
 -> Append (cost=0.00..24.34 rows=5 width=80) (actual time=0.044..0.045 rows=2 loops=14)
 -> Seq Scan on scans.scandata03 s3 (cost=0.00..0.00 rows=1 width=186) (actual time=0.000..0.000 rows=0 loops=14)
 Output: s3.sdate, s3.stime, s3.lorry, s3.zc5x3, s3.country, s3.emadr1, s3.emadr2, s3.parcelno
 Filter: ((s3.parcelno IS NOT NULL) AND ((d.parcelno)::text = (s3.parcelno)::text))
 -> Index Scan using scandata03_2013_pl_indx on scans.scandata03_2013 s3_1 (cost=0.14..0.26 rows=1 width=51) (actual time=0.001..0.001 rows=0 loops=14)
 Output: s3_1.sdate, s3_1.stime, s3_1.lorry, s3_1.zc5x3, s3_1.country, s3_1.emadr1, s3_1.emadr2, s3_1.parcelno
 Index Cond: (((s3_1.parcelno)::text = (d.parcelno)::text) AND (s3_1.parcelno IS NOT NULL))
 -> Index Scan using scandata03_2014_pl_indx on scans.scandata03_2014 s3_2 (cost=0.42..7.55 rows=1 width=53) (actual time=0.009..0.009 rows=0 loops=14)
 Output: s3_2.sdate, s3_2.stime, s3_2.lorry, s3_2.zc5x3, s3_2.country, s3_2.emadr1, s3_2.emadr2, s3_2.parcelno
 Index Cond: (((s3_2.parcelno)::text = (d.parcelno)::text) AND (s3_2.parcelno IS NOT NULL))
 -> Index Scan using scandata03_2015_pl_indx on scans.scandata03_2015 s3_3 (cost=0.42..8.21 rows=1 width=54) (actual time=0.013..0.013 rows=0 loops=14)
 Output: s3_3.sdate, s3_3.stime, s3_3.lorry, s3_3.zc5x3, s3_3.country, s3_3.emadr1, s3_3.emadr2, s3_3.parcelno
 Index Cond: (((s3_3.parcelno)::text = (d.parcelno)::text) AND (s3_3.parcelno IS NOT NULL))
 -> Index Scan using scandata03_2016_pl_indx on scans.scandata03_2016 s3_4 (cost=0.43..8.31 rows=1 width=55) (actual time=0.019..0.020 rows=2 loops=14)
 Output: s3_4.sdate, s3_4.stime, s3_4.lorry, s3_4.zc5x3, s3_4.country, s3_4.emadr1, s3_4.emadr2, s3_4.parcelno
 Index Cond: (((s3_4.parcelno)::text = (d.parcelno)::text) AND (s3_4.parcelno IS NOT NULL))
Planning time: 4.670 ms
Execution time: 701.550 ms

Question 2

An EXPLAIN ANALYZE might be useful.

Question 3

Please edit your question add the create table statements for the tables in question (including all indexes) and the execution plan generated using explain (analyze, verbose). Formatted text please, no screen shots

Question 4

Note: AND s3.parcelno IS NOT NULL will turn the LEFT JOIN into a plain join.

Question 5

Note2: ORDER BY s3.sdate, s3.stime DESC looks wrong. Why not combine date+time into a timestamp? And, since you seem to be interested in the most recent s3 record, why don't you select the most recent s3 record, instead of the (wrong) order by + distinct?

Question 6

IMO that ORDER BY is used to dictate which detail record from s3 will show up in the DISTINCT ON ... My guess is that there is an 1:N relation between d and s3.

Question 7

It seems to me there are lots of unnecessary nesting. Check if this is functionally equivalent

select distinct on (d.parcelno) d.*
from
 dispatcher.detour_avis d
 inner join
 scans.scandata03 s3 on d.parcelno = s3.parcelno
 left join
 scans.scandata14 s14 on d.parcelno = s14.parcelno
where
 d.ddate > now() - interval '5 day'
 and d.parcelshop_id is not null and parcelshop_id != emadr2
 and (
 s14.parcelno is null or
 (s14.sdate + s14.stime)::timestamp < (s3.sdate + s3.stime)::timestamp
 )
order by d.parcelno

As you are doing a left join and putting in the where clause the s3.parcelno is not null condition which contains the right table column join condition you are really doing an inner join. So I just eliminated it from the where clause and turned the left into an inner join

Question 8

Note: IMO you can get rid of the LEFT JOIN s14 and the IS NULL OR ... by putting it into a NOT EXISTS(... s14 ...) construct.

Question 9

unfortunately not, I added a better explanation in the original post.

score 1 · Accepted Answer · 2016-11-09 11:53:44Z

It seems to me there are lots of unnecessary nesting. Check if this is functionally equivalent

select distinct on (d.parcelno) d.*
from
 dispatcher.detour_avis d
 inner join
 scans.scandata03 s3 on d.parcelno = s3.parcelno
 left join
 scans.scandata14 s14 on d.parcelno = s14.parcelno
where
 d.ddate > now() - interval '5 day'
 and d.parcelshop_id is not null and parcelshop_id != emadr2
 and (
 s14.parcelno is null or
 (s14.sdate + s14.stime)::timestamp < (s3.sdate + s3.stime)::timestamp
 )
order by d.parcelno

As you are doing a left join and putting in the where clause the s3.parcelno is not null condition which contains the right table column join condition you are really doing an inner join. So I just eliminated it from the where clause and turned the left into an inner join

Note: IMO you can get rid of the LEFT JOIN s14 and the IS NULL OR ... by putting it into a NOT EXISTS(... s14 ...) construct.
unfortunately not, I added a better explanation in the original post.

CollectivesTM on Stack Overflow

PostgreSQL query performance and possible optimisation

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related