Improve DISTINCT SQL-query, PostgreSQL 9.6

Question 1

I have a table:

CREATE TABLE noise (
 id bigint NOT NULL,
 time_noise timestamp without time zone NOT NULL,
 base_name text,
 level real,
);

And I want to get last data (by time) for every base_name:

\timing on
SELECT DISTINCT ON (base_name) base_name, time_noise, level 
FROM noise 
ORDER BY base_name, time_noise DESC;

But this query is too slow, ~2s.

explain (analyze, buffers):

 QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------
 Unique (cost=197699.46..204176.80 rows=1 width=20) (actual time=1617.637..1795.908 rows=1 loops=1)
 Buffers: shared hit=53222, temp read=4775 written=4775
 -> Sort (cost=197699.46..200938.13 rows=1295469 width=20) (actual time=1617.636..1719.406 rows=1302740 loops=1)
 Sort Key: base_name, time_noise DESC
 Sort Method: external merge Disk: 38192kB
 Buffers: shared hit=53222, temp read=4775 written=4775
 -> Seq Scan on noise (cost=0.00..66176.69 rows=1295469 width=20) (actual time=0.006..223.798 rows=1302740 loops=1)
 Buffers: shared hit=53222
 Planning time: 0.053 ms
 Execution time: 1803.080 ms

How can it be improved?

with work_mem=160MB

 Unique (cost=197699.46..204176.80 rows=1 width=20) (actual time=1559.037..1712.098 rows=1 loops=1)
 Buffers: shared hit=53222
 -> Sort (cost=197699.46..200938.13 rows=1295469 width=20) (actual time=1559.036..1635.336 rows=1302740 loops=1)
 Sort Key: base_name, time_noise DESC
 Sort Method: quicksort Memory: 150929kB
 Buffers: shared hit=53222
 -> Seq Scan on noise (cost=0.00..66176.69 rows=1295469 width=20) (actual time=0.007..232.343 rows=1302740 loops=1)
 Buffers: shared hit=53222
 Planning time: 0.054 ms
 Execution time: 1713.740 ms

I found another approach, 10times faster:

select (base_name, time_noise, level)
from noise 
where (base_name, time_noise) in (select base_name, max(time_noise) from noise group by base_name);

And create INDEX for time_noise.

 QUERY PLAN 
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop (cost=81714.75..81719.74 rows=2 width=32) (actual time=193.519..193.537 rows=2 loops=1)
 Buffers: shared hit=74800
 -> Finalize GroupAggregate (cost=81714.32..81714.40 rows=2 width=15) (actual time=193.490..193.493 rows=2 loops=1)
 Group Key: noise_1.base_name
 Buffers: shared hit=74792
 -> Sort (cost=81714.32..81714.34 rows=8 width=15) (actual time=193.483..193.485 rows=10 loops=1)
 Sort Key: noise_1.base_name
 Sort Method: quicksort Memory: 25kB
 Buffers: shared hit=74792
 -> Gather (cost=81713.38..81714.20 rows=8 width=15) (actual time=193.411..193.462 rows=10 loops=1)
 Workers Planned: 4
 Workers Launched: 4
 Buffers: shared hit=74792
 -> Partial HashAggregate (cost=80713.38..80713.40 rows=2 width=15) (actual time=189.863..189.864 rows=2 loops=5)
 Group Key: noise_1.base_name
 Buffers: shared hit=74076
 -> Parallel Seq Scan on noise noise_1 (cost=0.00..78500.92 rows=442492 width=15) (actual time=0.014..112.629 rows=353746 loops=5)
 Buffers: shared hit=74076
 -> Index Scan using time_noise on noise (cost=0.43..2.65 rows=1 width=19) (actual time=0.015..0.017 rows=1 loops=2)
 Index Cond: (time_noise = (max(noise_1.time_noise)))
 Filter: (noise_1.base_name = base_name)
 Rows Removed by Filter: 1
 Buffers: shared hit=8
 Planning time: 0.174 ms
 Execution time: 209.365 ms

Any other suggestions?

Question 2

Nice alternative ;) - you should add that as an answer. But: select (base_name, time_noise, level) is something completely different than select base_name, time_noise, level in Postgres - you should remove the parentheses around the columns in the SELECT part (those in the WHERE are required)

Question 3

Why does one of these return one row and one returns two rows? This looks like a good candidate for a loose index scan wiki.postgresql.org/wiki/Loose_indexscan, but it will be a very verbose and messy query.

Question 4

Another way is (here I also cut off the range of the date):

select base_name, time_noise, slow
from noise 
where (base_name, time_noise) in (select base_name, max(time_noise) from noise WHERE time_noise >= (NOW() - INTERVAL '5 minutes' ) group by base_name);

~2ms, thats fine.

delkov delkov 2712 silver badges6 bronze badges · Answer 1 · 2018-09-10 13:50:42Z

Another way is (here I also cut off the range of the date):

select base_name, time_noise, slow
from noise 
where (base_name, time_noise) in (select base_name, max(time_noise) from noise WHERE time_noise >= (NOW() - INTERVAL '5 minutes' ) group by base_name);

~2ms, thats fine.

Stack Exchange Network

Improve DISTINCT SQL-query, PostgreSQL 9.6

with work_mem=160MB

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Improve DISTINCT SQL-query, PostgreSQL 9.6

with work_mem=160MB

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions