postgresql being slow on count distinct for dates

Question 1

I have a very simple, but very big, table. Its schema is like this

(yadda int, yadda1 int, yaddate date, ... other stuff).

Now, yaddate has an index by itself and it is also in other indexes together with other columns (eg. (yadda1, date)).

The table itself is some 100M rows.

When I run

 select distinct date from mybigtable;

the time needed to get the list is in the range of 200 seconds. Explain Analyze tells me it's doing a seq scan and I don't understand why, since I the index is there.

First thing I am trying is reindex on the date only column index.

Am I doing something wrong?
Since obviously there's something I am missing about seq and index scan, can someone shed some light?
How can I make that query faster?

TIA.

Question 2

wiki.postgresql.org/wiki/Slow_Query_Questions

Question 3

See stackoverflow.com/a/14732410/32453 putting the select distinct query in a subquery and counting that worked for me, bizarrely.

Question 4

There is a trick with distinct to get it fast using index, that you can try. It involves creating a function looking like that:

CREATE OR REPLACE FUNCTION small_distinct(IN tablename character varying, IN fieldname character varying, IN sample anyelement DEFAULT '1800-01-01'::date)
 RETURNS SETOF anyelement AS
$BODY$
BEGIN
 EXECUTE 'SELECT '||fieldName||' FROM '||tableName||' ORDER BY '||fieldName
 ||' LIMIT 1' INTO result;
 WHILE result IS NOT NULL LOOP
 RETURN NEXT;
 EXECUTE 'SELECT '||fieldName||' FROM '||tableName
 ||' WHERE '||fieldName||' > 1ドル ORDER BY ' || fieldName || ' LIMIT 1'
 INTO result USING result;
 END LOOP;
END;
$BODY$
 LANGUAGE plpgsql VOLATILE
 COST 100
 ROWS 1000;

Then create an index on the column you want to count distinct, and select small_distinct('yourtable', 'yaddate'); should return you the distinct values you want, without the need to read the table.

Try it, be beware, I'm not sure it will work right out of the box, as I quickly adapted it from a varchar function.

Question 5

Nice trick! Might try that myself!

Question 6

For this query:

select distinct date from mybigtable;

or its twin:

select date from mybigtable group by 1;

... the whole table has to be read. Postgres is not going to use any index, except, possibly, a covering index that is substantially smaller than the table itself. Postgres Wiki on slow counting.

Also, to be precise, that's not a count. If you are after an actual count, an estimate might be enough, which can be had much faster. Postgres Wiki on count estimates.

If you provide more details of what you have and want you want, there might be workarounds with a materialized view or a lookup table ...

Question 7

Hi, thanks. What I have is: various data with dates. Data always refers to the first of the month (date is always in the format YYYY-MM-1), so what I want is the set of months this dataset covers. I am not counting, I really want the set of N dates that are in the dataset. Does this help?

alci alci 4524 silver badges10 bronze badges · Answer 1 · 2014-04-21 16:13:55Z

There is a trick with distinct to get it fast using index, that you can try. It involves creating a function looking like that:

CREATE OR REPLACE FUNCTION small_distinct(IN tablename character varying, IN fieldname character varying, IN sample anyelement DEFAULT '1800-01-01'::date)
 RETURNS SETOF anyelement AS
$BODY$
BEGIN
 EXECUTE 'SELECT '||fieldName||' FROM '||tableName||' ORDER BY '||fieldName
 ||' LIMIT 1' INTO result;
 WHILE result IS NOT NULL LOOP
 RETURN NEXT;
 EXECUTE 'SELECT '||fieldName||' FROM '||tableName
 ||' WHERE '||fieldName||' > 1ドル ORDER BY ' || fieldName || ' LIMIT 1'
 INTO result USING result;
 END LOOP;
END;
$BODY$
 LANGUAGE plpgsql VOLATILE
 COST 100
 ROWS 1000;

Then create an index on the column you want to count distinct, and select small_distinct('yourtable', 'yaddate'); should return you the distinct values you want, without the need to read the table.

Try it, be beware, I'm not sure it will work right out of the box, as I quickly adapted it from a varchar function.

Nice trick! Might try that myself!

Gaius
– Gaius

2017年10月29日 09:18:10 +00:00
Commented Oct 29, 2017 at 9:18

score 1 · Answer 2 · 2014-04-15 19:21:42Z

For this query:

select distinct date from mybigtable;

or its twin:

select date from mybigtable group by 1;

... the whole table has to be read. Postgres is not going to use any index, except, possibly, a covering index that is substantially smaller than the table itself. Postgres Wiki on slow counting.

Also, to be precise, that's not a count. If you are after an actual count, an estimate might be enough, which can be had much faster. Postgres Wiki on count estimates.

If you provide more details of what you have and want you want, there might be workarounds with a materialized view or a lookup table ...

Hi, thanks. What I have is: various data with dates. Data always refers to the first of the month (date is always in the format YYYY-MM-1), so what I want is the set of months this dataset covers. I am not counting, I really want the set of N dates that are in the dataset. Does this help?

Stack Exchange Network

postgresql being slow on count distinct for dates

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

postgresql being slow on count distinct for dates

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions