I have an analytics database on Postgres 9.5 I'm taking advantage of constraint partitioning with table inheritance to split up events into monthly tables. The base (parent) table contains no rows.
I have a relatively simple query but the planner is coming up with a completely insane plan where it materializes within a nested loop as can be seen from the explain.
The query
select count(person_id) as thecount from (
select distinct A.person_id from event_page as O
join alias as A on (O.person_alias = A.alias)
where O.timestamp_ between '2017-4-28 04:00:00' and '2017-4-29 03:59:59.999'
and O.location_host = 'www.foo.com' AND O.location_path= '/ca/sale'
) as alias_3494697
The explain: https://explain.depesz.com/s/IoO
My theory is that table inheritance with constraint partitioning is tripping up the planner somehow. If I replace the base table with the monthly table for April I get a sane plan with good performance:
https://explain.depesz.com/s/GpbU
NOTE The slight difference from the above query - i'm using the child table instead of the base table (ie no inheritance at play)
select count(person_id) as thecount from (
select distinct A.person_id from event_page_2017_4 as O
join alias as A on (O.person_alias = A.alias)
where O.timestamp_ between '2017-4-28 04:00:00' and '2017-4-29 03:59:59.999'
and O.location_host = 'www.foo.com' AND O.location_path= '/ca/sale'
) as alias_3494697
All stats are up to date via vacuum analyze. The relevant schema is below:
CREATE TABLE event_page (
id CHAR(24) PRIMARY KEY,
timestamp_ TIMESTAMP NOT NULL,
person_alias VARCHAR(128) NOT NULL,
visitor_id VARCHAR(128) NOT NULL,
session_id VARCHAR(24) NOT NULL,
ip_address VARCHAR(64) ,
user_agent VARCHAR(256) ,
operating_system VARCHAR(64) ,
device_type VARCHAR(64) ,
browser VARCHAR(64) ,
browser_major VARCHAR(64) ,
page_title VARCHAR(1024) ,
location_host VARCHAR(256) ,
location_path VARCHAR(1024) ,
location_query VARCHAR(1024) ,
location_fragment VARCHAR(1024) ,
referrer_host VARCHAR(256) ,
referrer_path VARCHAR(1024) ,
referrer_query VARCHAR(1024) ,
referrer_fragment VARCHAR(1024) ,
duration INT
);
--the monthly tables all look like this example for April 2017
CREATE TABLE event_page_2017_4
(LIKE event_page including defaults including constraints including indexes,
CONSTRAINT page_2017_4_part CHECK ( timestamp_ >= '2017-4-01'::timestamp AND timestamp_ < '2017-05-01'::timestamp )
)
INHERITS (event_page);
--indexes are the sames on each of the monthly tables
CREATE INDEX ep_2017_4_location
ON event_page_2017_4
USING btree
(location_host varchar_pattern_ops, location_path varchar_pattern_ops, location_query varchar_pattern_ops, location_fragment varchar_pattern_ops, timestamp_, person_alias, session_id);
CREATE INDEX ep_2017_4_ts
ON event_page_2017_4
USING btree
(timestamp_, person_alias, session_id);
CREATE INDEX ep_2017_4_a_ses_ts
ON event_page_2017_4
USING btree
(person_alias, session_id, timestamp_);
--the alias tables and two indexes
CREATE TABLE alias (
person_id CHAR(24),
alias VARCHAR(128) NOT NULL,
first_seen TIMESTAMP,
soft BOOLEAN
);
ALTER TABLE ALIAS ADD CONSTRAINT alias_alias_pkey PRIMARY KEY(alias);
CREATE INDEX alias_a_p ON alias (alias, person_id);
CREATE INDEX alias_p_a ON alias (person_id, alias);
NOTE: The problem went away after upgrading to 9.6.1
1 Answer 1
A couple of observations:
1.
The query plan for the slow query shows:
Seq Scan on event_page o (cost=10,000,000,000.00..10,000,000,000.00 rows=1 width=274)
... which indicates that you ran with SET enable_seqscan = off;
. And that means your Postgres version positively could not find any other way than the sequential scan.
2.
Did you enable constraint_exclusion like the manual advises here:
- Ensure that the constraint_exclusion configuration parameter is not disabled in
postgresql.conf
. If it is, queries will not be optimized as desired.
3.
The day boundaries in your query are not in sync with the partitioning. (But that has no immediate effect for the given query, see comments.)
The CHECK
constraint reads:
CHECK (timestamp_ >= '2017-4-01'::timestamp AND timestamp_ < '2017-05-01'::timestamp)
There may be confusion with timestamp
vs. timestamptz
and / or time zones. Or something got lost in translation and the question is misleading in that respect.
Either way, the expression in your WHERE
clause opens up a sneaky corner case (even if your day boundaries were in sync):
(削除) where O.timestamp_ between '2017-4-28 04:00:00' and '2017-4-29 03:59:59.999' (削除ここまで)
Postgres timestamps are implemented as 8-byte integers allowing 6 decimal places. A row with '2017-4-29 03:59:59.9995'
would not behave as expected.
Details:
Plus, this does not play well with constraint exclusion. More partitions than necessary may have to be read.
Use instead:
WHERE o.timestamp_ >= '2017-4-28 04:00' -- include lower bound
AND o.timestamp_ < '2017-4-29 04:00' -- exclude upper bound
You may want get "days" in your partitions in sync with your queries - 00:00 vs. 04:00 (or the evil 03:59:59.999) - and maybe use timestamptz
to begin with.
4.
I see for the index-only scan on alias
:
Heap Fetches: 1918543
... which is exceptionally high. There may or may not be problems with the visibility map. (See also: Updating The Visibility Map) Did you run VACUUM FULL
on the table(s)? (Compare this thread on pgsql-performance.) Try plain VACUUM
, which updates the visibility map properly. But that's all hard to say, since you seem to have been running with enable_seqscan = off
. Not sure how this plays out in combination with all the other stuff at work here.
There have been various improvements to VACUUM
for Postgres 9.6 (among other things), this may explain the difference - or what you observed in pg 9.6 is coincidence and due to other factors.
5.
If alias.alias
is unique (and referential integrity can be assumed), the query can be simplified. Just count(DISTINCT o.person_alias)
without even joining to table alias
at all. Like:
SELECT count(DISTINCT o.person_alias) AS thecount
FROM event_page o
WHERE o.timestamp_ >= '2017-4-28 04:00' -- include lower bound
AND o.timestamp_ < '2017-4-29 04:00' -- exclude upper bound
AND o.location_host = 'www.foo.com'
AND o.location_path = '/ca/sale';
(Or use the subquery like you had it, may be faster than count(DISTINCT ...)
.)
If alias.alias
is not unique, your query may be wrong / ambiguous.
Why do you have person_alias
in table event_page
and not person_id
to begin with?
6.
You use the data type CHAR(24)
as PK for your tables, which is almost certainly a bad choice. Consider a serial
or bigserial
column or maybe a uuid
. Numeric types are smaller and faster and not burdened by collation rules, varying byte length, etc. Related:
-
In my first draft I misread and assumed you had daily partitions. Since you have 1 partition per month, the comparison is fair after all. I leave the rest discussing partition bounds, since that's still relevant.Erwin Brandstetter– Erwin Brandstetter2017年05月01日 22:42:29 +00:00Commented May 1, 2017 at 22:42
-
Thanks Erwin, to address your points: 1. Yes, enable_seqscan = off however the parent table is always empty, only the tables that inherit contain data. 2. Yes, constrain exclusion is set to
partition
, the default 3. You lost me here. The monthly partitions are set up to with inclusive & exclusive bounds and there should be no overlap. Isn't '2017年4月01日' equivalent to '2017年4月01日 00:00:00.000'? Based on my experiments the constraint partitions worked as expected. 4. VACUUM FULL (rewrite) did lower the heap fetchesmaxTrialfire– maxTrialfire2017年05月01日 22:43:51 +00:00Commented May 1, 2017 at 22:43 -
continued.... 5. A person may have one or more aliases hence the join. A person's alias may change throughout their life. I'm interested in finding the distinct people which is always <= to the distinct aliases. 6. Will look into this, thanks for the tipmaxTrialfire– maxTrialfire2017年05月01日 22:45:17 +00:00Commented May 1, 2017 at 22:45
-
@maxTrialfire: 3. See my comment and updates, I had misread the date format. 4.
VACUUM
is superior toVACUUM FULL
for updating the visibility map. 5. Makes sense now. (Did you ensure that aliases are unambiguous over time?)Erwin Brandstetter– Erwin Brandstetter2017年05月01日 22:51:52 +00:00Commented May 1, 2017 at 22:51 -
Edited the question to include PK constraint on
alias
which I seem to have left out.maxTrialfire– maxTrialfire2017年05月01日 23:58:38 +00:00Commented May 1, 2017 at 23:58
Explore related questions
See similar questions with these tags.