I have a very simple, but very big, table. Its schema is like this
(yadda int, yadda1 int, yaddate date, ... other stuff).
Now, yaddate has an index by itself and it is also in other indexes together with other columns (eg. (yadda1, date)).
The table itself is some 100M rows.
When I run
select distinct date from mybigtable;
the time needed to get the list is in the range of 200 seconds. Explain Analyze tells me it's doing a seq scan and I don't understand why, since I the index is there.
First thing I am trying is reindex on the date only column index.
- Am I doing something wrong?
- Since obviously there's something I am missing about seq and index scan, can someone shed some light?
- How can I make that query faster?
TIA.
-
wiki.postgresql.org/wiki/Slow_Query_Questionsuser1822– user18222014年04月15日 17:22:05 +00:00Commented Apr 15, 2014 at 17:22
-
See stackoverflow.com/a/14732410/32453 putting the select distinct query in a subquery and counting that worked for me, bizarrely.rogerdpack– rogerdpack2014年11月17日 22:26:53 +00:00Commented Nov 17, 2014 at 22:26
2 Answers 2
There is a trick with distinct to get it fast using index, that you can try. It involves creating a function looking like that:
CREATE OR REPLACE FUNCTION small_distinct(IN tablename character varying, IN fieldname character varying, IN sample anyelement DEFAULT '1800-01-01'::date)
RETURNS SETOF anyelement AS
$BODY$
BEGIN
EXECUTE 'SELECT '||fieldName||' FROM '||tableName||' ORDER BY '||fieldName
||' LIMIT 1' INTO result;
WHILE result IS NOT NULL LOOP
RETURN NEXT;
EXECUTE 'SELECT '||fieldName||' FROM '||tableName
||' WHERE '||fieldName||' > 1ドル ORDER BY ' || fieldName || ' LIMIT 1'
INTO result USING result;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100
ROWS 1000;
Then create an index on the column you want to count distinct, and select small_distinct('yourtable', 'yaddate');
should return you the distinct values you want, without the need to read the table.
Try it, be beware, I'm not sure it will work right out of the box, as I quickly adapted it from a varchar function.
-
Nice trick! Might try that myself!Gaius– Gaius2017年10月29日 09:18:10 +00:00Commented Oct 29, 2017 at 9:18
For this query:
select distinct date from mybigtable;
or its twin:
select date from mybigtable group by 1;
... the whole table has to be read. Postgres is not going to use any index, except, possibly, a covering index that is substantially smaller than the table itself. Postgres Wiki on slow counting.
Also, to be precise, that's not a count. If you are after an actual count, an estimate might be enough, which can be had much faster. Postgres Wiki on count estimates.
If you provide more details of what you have and want you want, there might be workarounds with a materialized view or a lookup table ...
-
Hi, thanks. What I have is: various data with dates. Data always refers to the first of the month (date is always in the format YYYY-MM-1), so what I want is the set of months this dataset covers. I am not counting, I really want the set of N dates that are in the dataset. Does this help?mgm– mgm2014年04月15日 22:25:15 +00:00Commented Apr 15, 2014 at 22:25
Explore related questions
See similar questions with these tags.