4

I have an analytics database on Postgres 9.5 I'm taking advantage of constraint partitioning with table inheritance to split up events into monthly tables. The base (parent) table contains no rows.

I have a relatively simple query but the planner is coming up with a completely insane plan where it materializes within a nested loop as can be seen from the explain.

The query

select count(person_id) as thecount from (
 select distinct A.person_id from event_page as O 
 join alias as A on (O.person_alias = A.alias) 
 where O.timestamp_ between '2017-4-28 04:00:00' and '2017-4-29 03:59:59.999' 
 and O.location_host = 'www.foo.com' AND O.location_path= '/ca/sale' 
) as alias_3494697

The explain: https://explain.depesz.com/s/IoO

My theory is that table inheritance with constraint partitioning is tripping up the planner somehow. If I replace the base table with the monthly table for April I get a sane plan with good performance:

https://explain.depesz.com/s/GpbU

NOTE The slight difference from the above query - i'm using the child table instead of the base table (ie no inheritance at play)

select count(person_id) as thecount from (
 select distinct A.person_id from event_page_2017_4 as O 
 join alias as A on (O.person_alias = A.alias) 
 where O.timestamp_ between '2017-4-28 04:00:00' and '2017-4-29 03:59:59.999' 
 and O.location_host = 'www.foo.com' AND O.location_path= '/ca/sale' 
) as alias_3494697

All stats are up to date via vacuum analyze. The relevant schema is below:

CREATE TABLE event_page (
 id CHAR(24) PRIMARY KEY,
 timestamp_ TIMESTAMP NOT NULL,
 person_alias VARCHAR(128) NOT NULL,
 visitor_id VARCHAR(128) NOT NULL,
 session_id VARCHAR(24) NOT NULL,
 ip_address VARCHAR(64) ,
 user_agent VARCHAR(256) ,
 operating_system VARCHAR(64) ,
 device_type VARCHAR(64) ,
 browser VARCHAR(64) ,
 browser_major VARCHAR(64) ,
 page_title VARCHAR(1024) ,
 location_host VARCHAR(256) ,
 location_path VARCHAR(1024) ,
 location_query VARCHAR(1024) ,
 location_fragment VARCHAR(1024) ,
 referrer_host VARCHAR(256) ,
 referrer_path VARCHAR(1024) ,
 referrer_query VARCHAR(1024) ,
 referrer_fragment VARCHAR(1024) ,
 duration INT
);
--the monthly tables all look like this example for April 2017
CREATE TABLE event_page_2017_4
(LIKE event_page including defaults including constraints including indexes,
CONSTRAINT page_2017_4_part CHECK ( timestamp_ >= '2017-4-01'::timestamp AND timestamp_ < '2017-05-01'::timestamp )
)
INHERITS (event_page);
--indexes are the sames on each of the monthly tables
CREATE INDEX ep_2017_4_location
 ON event_page_2017_4
 USING btree
 (location_host varchar_pattern_ops, location_path varchar_pattern_ops, location_query varchar_pattern_ops, location_fragment varchar_pattern_ops, timestamp_, person_alias, session_id);
CREATE INDEX ep_2017_4_ts
 ON event_page_2017_4
 USING btree
 (timestamp_, person_alias, session_id);
CREATE INDEX ep_2017_4_a_ses_ts 
 ON event_page_2017_4
 USING btree
 (person_alias, session_id, timestamp_); 
--the alias tables and two indexes
CREATE TABLE alias (
 person_id CHAR(24),
 alias VARCHAR(128) NOT NULL,
 first_seen TIMESTAMP,
 soft BOOLEAN
);
ALTER TABLE ALIAS ADD CONSTRAINT alias_alias_pkey PRIMARY KEY(alias);
CREATE INDEX alias_a_p ON alias (alias, person_id);
CREATE INDEX alias_p_a ON alias (person_id, alias);

NOTE: The problem went away after upgrading to 9.6.1

asked Apr 28, 2017 at 19:07
0

1 Answer 1

5

A couple of observations:

1.

The query plan for the slow query shows:

Seq Scan on event_page o (cost=10,000,000,000.00..10,000,000,000.00 rows=1 width=274)

... which indicates that you ran with SET enable_seqscan = off;. And that means your Postgres version positively could not find any other way than the sequential scan.

2.

Did you enable constraint_exclusion like the manual advises here:

  1. Ensure that the constraint_exclusion configuration parameter is not disabled in postgresql.conf. If it is, queries will not be optimized as desired.

3.

The day boundaries in your query are not in sync with the partitioning. (But that has no immediate effect for the given query, see comments.)
The CHECK constraint reads:

CHECK (timestamp_ >= '2017-4-01'::timestamp AND timestamp_ < '2017-05-01'::timestamp)

There may be confusion with timestamp vs. timestamptz and / or time zones. Or something got lost in translation and the question is misleading in that respect.

Either way, the expression in your WHERE clause opens up a sneaky corner case (even if your day boundaries were in sync):

(削除) where O.timestamp_ between '2017-4-28 04:00:00' and '2017-4-29 03:59:59.999' (削除ここまで)

Postgres timestamps are implemented as 8-byte integers allowing 6 decimal places. A row with '2017-4-29 03:59:59.9995' would not behave as expected. Details:

Plus, this does not play well with constraint exclusion. More partitions than necessary may have to be read.

Use instead:

WHERE o.timestamp_ >= '2017-4-28 04:00' -- include lower bound
AND o.timestamp_ < '2017-4-29 04:00' -- exclude upper bound

You may want get "days" in your partitions in sync with your queries - 00:00 vs. 04:00 (or the evil 03:59:59.999) - and maybe use timestamptz to begin with.

4.

I see for the index-only scan on alias:

Heap Fetches: 1918543

... which is exceptionally high. There may or may not be problems with the visibility map. (See also: Updating The Visibility Map) Did you run VACUUM FULL on the table(s)? (Compare this thread on pgsql-performance.) Try plain VACUUM, which updates the visibility map properly. But that's all hard to say, since you seem to have been running with enable_seqscan = off. Not sure how this plays out in combination with all the other stuff at work here.

There have been various improvements to VACUUM for Postgres 9.6 (among other things), this may explain the difference - or what you observed in pg 9.6 is coincidence and due to other factors.

5.

If alias.alias is unique (and referential integrity can be assumed), the query can be simplified. Just count(DISTINCT o.person_alias) without even joining to table alias at all. Like:

SELECT count(DISTINCT o.person_alias) AS thecount
FROM event_page o
WHERE o.timestamp_ >= '2017-4-28 04:00' -- include lower bound
AND o.timestamp_ < '2017-4-29 04:00' -- exclude upper bound
AND o.location_host = 'www.foo.com'
AND o.location_path = '/ca/sale';

(Or use the subquery like you had it, may be faster than count(DISTINCT ...).)

If alias.alias is not unique, your query may be wrong / ambiguous.

Why do you have person_alias in table event_page and not person_id to begin with?

6.

You use the data type CHAR(24) as PK for your tables, which is almost certainly a bad choice. Consider a serial or bigserial column or maybe a uuid. Numeric types are smaller and faster and not burdened by collation rules, varying byte length, etc. Related:

answered May 1, 2017 at 21:03
5
  • In my first draft I misread and assumed you had daily partitions. Since you have 1 partition per month, the comparison is fair after all. I leave the rest discussing partition bounds, since that's still relevant. Commented May 1, 2017 at 22:42
  • Thanks Erwin, to address your points: 1. Yes, enable_seqscan = off however the parent table is always empty, only the tables that inherit contain data. 2. Yes, constrain exclusion is set to partition, the default 3. You lost me here. The monthly partitions are set up to with inclusive & exclusive bounds and there should be no overlap. Isn't '2017年4月01日' equivalent to '2017年4月01日 00:00:00.000'? Based on my experiments the constraint partitions worked as expected. 4. VACUUM FULL (rewrite) did lower the heap fetches Commented May 1, 2017 at 22:43
  • continued.... 5. A person may have one or more aliases hence the join. A person's alias may change throughout their life. I'm interested in finding the distinct people which is always <= to the distinct aliases. 6. Will look into this, thanks for the tip Commented May 1, 2017 at 22:45
  • @maxTrialfire: 3. See my comment and updates, I had misread the date format. 4. VACUUM is superior to VACUUM FULL for updating the visibility map. 5. Makes sense now. (Did you ensure that aliases are unambiguous over time?) Commented May 1, 2017 at 22:51
  • Edited the question to include PK constraint on alias which I seem to have left out. Commented May 1, 2017 at 23:58

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.