0

I have a table:

CREATE TABLE noise (
 id bigint NOT NULL,
 time_noise timestamp without time zone NOT NULL,
 base_name text,
 level real,
);

And I want to get last data (by time) for every base_name:

\timing on
SELECT DISTINCT ON (base_name) base_name, time_noise, level 
FROM noise 
ORDER BY base_name, time_noise DESC;

But this query is too slow, ~2s.

explain (analyze, buffers):

 QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------
 Unique (cost=197699.46..204176.80 rows=1 width=20) (actual time=1617.637..1795.908 rows=1 loops=1)
 Buffers: shared hit=53222, temp read=4775 written=4775
 -> Sort (cost=197699.46..200938.13 rows=1295469 width=20) (actual time=1617.636..1719.406 rows=1302740 loops=1)
 Sort Key: base_name, time_noise DESC
 Sort Method: external merge Disk: 38192kB
 Buffers: shared hit=53222, temp read=4775 written=4775
 -> Seq Scan on noise (cost=0.00..66176.69 rows=1295469 width=20) (actual time=0.006..223.798 rows=1302740 loops=1)
 Buffers: shared hit=53222
 Planning time: 0.053 ms
 Execution time: 1803.080 ms

How can it be improved?

with work_mem=160MB

 Unique (cost=197699.46..204176.80 rows=1 width=20) (actual time=1559.037..1712.098 rows=1 loops=1)
 Buffers: shared hit=53222
 -> Sort (cost=197699.46..200938.13 rows=1295469 width=20) (actual time=1559.036..1635.336 rows=1302740 loops=1)
 Sort Key: base_name, time_noise DESC
 Sort Method: quicksort Memory: 150929kB
 Buffers: shared hit=53222
 -> Seq Scan on noise (cost=0.00..66176.69 rows=1295469 width=20) (actual time=0.007..232.343 rows=1302740 loops=1)
 Buffers: shared hit=53222
 Planning time: 0.054 ms
 Execution time: 1713.740 ms

I found another approach, 10times faster:

select (base_name, time_noise, level)
from noise 
where (base_name, time_noise) in (select base_name, max(time_noise) from noise group by base_name);

And create INDEX for time_noise.

 QUERY PLAN 
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop (cost=81714.75..81719.74 rows=2 width=32) (actual time=193.519..193.537 rows=2 loops=1)
 Buffers: shared hit=74800
 -> Finalize GroupAggregate (cost=81714.32..81714.40 rows=2 width=15) (actual time=193.490..193.493 rows=2 loops=1)
 Group Key: noise_1.base_name
 Buffers: shared hit=74792
 -> Sort (cost=81714.32..81714.34 rows=8 width=15) (actual time=193.483..193.485 rows=10 loops=1)
 Sort Key: noise_1.base_name
 Sort Method: quicksort Memory: 25kB
 Buffers: shared hit=74792
 -> Gather (cost=81713.38..81714.20 rows=8 width=15) (actual time=193.411..193.462 rows=10 loops=1)
 Workers Planned: 4
 Workers Launched: 4
 Buffers: shared hit=74792
 -> Partial HashAggregate (cost=80713.38..80713.40 rows=2 width=15) (actual time=189.863..189.864 rows=2 loops=5)
 Group Key: noise_1.base_name
 Buffers: shared hit=74076
 -> Parallel Seq Scan on noise noise_1 (cost=0.00..78500.92 rows=442492 width=15) (actual time=0.014..112.629 rows=353746 loops=5)
 Buffers: shared hit=74076
 -> Index Scan using time_noise on noise (cost=0.43..2.65 rows=1 width=19) (actual time=0.015..0.017 rows=1 loops=2)
 Index Cond: (time_noise = (max(noise_1.time_noise)))
 Filter: (noise_1.base_name = base_name)
 Rows Removed by Filter: 1
 Buffers: shared hit=8
 Planning time: 0.174 ms
 Execution time: 209.365 ms

Any other suggestions?

asked Sep 10, 2018 at 9:21
2
  • Nice alternative ;) - you should add that as an answer. But: select (base_name, time_noise, level) is something completely different than select base_name, time_noise, level in Postgres - you should remove the parentheses around the columns in the SELECT part (those in the WHERE are required) Commented Sep 10, 2018 at 13:37
  • Why does one of these return one row and one returns two rows? This looks like a good candidate for a loose index scan wiki.postgresql.org/wiki/Loose_indexscan, but it will be a very verbose and messy query. Commented Sep 10, 2018 at 14:22

1 Answer 1

1

Another way is (here I also cut off the range of the date):

select base_name, time_noise, slow
from noise 
where (base_name, time_noise) in (select base_name, max(time_noise) from noise WHERE time_noise >= (NOW() - INTERVAL '5 minutes' ) group by base_name);

~2ms, thats fine.

answered Sep 10, 2018 at 13:50

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.