Postgres version
Using PostgreSQL 10.3.
Table definition
CREATE TABLE tickets (
id bigserial primary key,
state character varying,
closed timestamp
);
CREATE INDEX "state_index" ON "tickets" ("state")
WHERE ((state)::text = 'open'::text));
Cardinality
The table contains 1027616 rows, with 51533 of the rows having state = 'open'
and closed IS NULL
, or 5%.
A query with a condition on state
performs well using an index scan as expected:
explain analyze select * from tickets where state = 'open';
Index Scan using state_index on tickets (cost=0.29..16093.57 rows=36599 width=212) (actual time=0.025..22.875 rows=37346 loops=1)
Planning time: 0.212 ms
Execution time: 25.697 ms
I am trying to achieve the same or better performance for the query with the condition closed IS NULL
so that I can drop the state
column and rely on the closed
column for fetching the same rows. closed
is null
for the same rows where state = 'open'
, hence the state
column is redundant.
select * from tickets where closed IS NULL;
However, none of the indexes I've tried result in a single index scan like the first query. Below are the indexes I've tried along with the EXPLAIN ANALYZE
results.
A partial index:
CREATE INDEX "closed_index" ON "tickets" ("closed") WHERE (closed IS NULL)
explain analyze select * from tickets where closed IS NULL;
Bitmap Heap Scan on tickets (cost=604.22..38697.91 rows=36559 width=212) (actual time=12.879..48.780 rows=37348 loops=1)
Recheck Cond: (closed IS NULL)
Heap Blocks: exact=14757
-> Bitmap Index Scan on closed_index (cost=0.00..595.09 rows=36559 width=0) (actual time=7.585..7.585 rows=37348 loops=1)
Planning time: 4.831 ms
Execution time: 52.068 ms
An expression index:
CREATE INDEX "closed_index" ON "tickets" ((closed IS NULL))
explain analyze select * from tickets where closed IS NULL;
Seq Scan on tickets (cost=0.00..45228.26 rows=36559 width=212) (actual time=0.025..271.418 rows=37348 loops=1)
Filter: (closed IS NULL)
Rows Removed by Filter: 836578
Planning time: 7.992 ms
Execution time: 274.504 ms
A partial expression index:
CREATE INDEX "closed_index" ON "tickets" ((closed IS NULL))
WHERE (closed IS NULL);
explain analyze select * from tickets where closed IS NULL;
Bitmap Heap Scan on tickets (cost=604.22..38697.91 rows=36559 width=212) (actual time=177.109..238.008 rows=37348 loops=1)
Recheck Cond: (closed IS NULL)
Heap Blocks: exact=14757
-> Bitmap Index Scan on "closed_index" (cost=0.00..595.09 rows=36559 width=0) (actual time=174.598..174.598 rows=37348 loops=1)
Planning time: 23.063 ms
Execution time: 241.292 ms
UPDATED
Expanded table definition:
CREATE TABLE tickets (
id bigserial primary key,
state character varying,
closed timestamp,
created timestamp,
updated timestamp,
title character varying,
size integer NOT NULL,
comment_count integer NOT NULL
);
CREATE INDEX "state_index" ON "tickets" ("state")
WHERE ((state)::text = 'open'::text));
Cardinality:
The table contains 1027616 rows, with 51533 of the rows having state = 'open' and closed IS NULL, or 5%. As mentioned above, I am trying to drop the state
column so I want to be able to fetch the same rows using a condition on the closed
column instead.
Query with condition on state
column uses index scan.
explain analyze select id, title, created, closed, updated from tickets where state = 'open';
Index Scan using state_index on tickets (cost=0.29..22901.58 rows=49356 width=72) (actual time=0.107..49.599 rows=51533 loops=1)
Planning time: 0.511 ms
Execution time: 54.366 ms
I'd like the same performance (index scan, ideally) when switching to querying on the closed
column. I tried the partial index on id
and closed IS NULL
:
CREATE INDEX closed_index ON tickets (id) WHERE closed IS NULL;
VACUUM ANALYZE tickets;
explain analyze select id, title, created, closed, updated from tickets where closed IS NULL;
Bitmap Heap Scan on tickets (cost=811.96..33999.42 rows=49461 width=72) (actual time=7.868..47.080 rows=51537 loops=1)
Recheck Cond: (closed IS NULL)
Heap Blocks: exact=17479
-> Bitmap Index Scan on closed_index (cost=0.00..799.60 rows=49461 width=0) (actual time=4.868..4.868 rows=51537 loops=1)
Planning time: 0.222 ms
Execution time: 51.028 ms
1 Answer 1
Assuming the central piece of information:
with ~15% of the rows having
state = 'open'
andclosed IS NULL
is supposed to mean the same 15 % of all 1031584 rows meet both these conditions (all details matter!). Both conditions should perform equally - returning around 155k rows (!)
Your query plans show 37346 qualifying rows, ~ 3.6 % not 15 %. Something's still not right in your question.
With 3.6 %, indexes only start to make sense. Your tiny row size effectively occupies ~ 52 bytes per row, around 155 rows per page. That would be 5-6 hits per page for a completely random distribution. Postgres has to read all pages anyway and a sequential scan should be the fastest plan. Filtering the misses should be faster than involving indexes in any way.
Typically, qualifying rows are more or less clustered and the fewer data pages are involved the more sense it would make to involve an index. All bitmap index scans, though, I see hardly any case for an index scan. Even far less (as far as there can be much "less" than "hardly any") for the 15 % you claim.
For your updated numbers (~ 5% of all rows match) I would still rather expect bitmap index scans than index scans. A possible explanation: table bloat with lots of dead tuples. You mentioned a high write load. That would result in fewer and fewer live tuples per data page and favor index scans (as compared to bitmap index scans). You might re-test your initial query after a VACUUM FULL ANALYZE
(if you can afford an exclusive lock on the table!). If my hypothesis holds, the physical table size would shrink substantially and you would then see a bitmap index scan instead of the index scan (and faster, too).
- For alignment optimized table is bigger than original table - why?
- Measure the size of a PostgreSQL table row
You may need more aggressive autovacuum
settings. See:
Partial index
Your "expression index" and "partial expression index" are not useful. We don't need closed IS NULL
as actual index expression (which is always true
here). The expression only adds cost and no gain.
The first, plain partial index is the more useful variant. But don't use closed
as index expression (again, always NULL
here). Instead, use any column possibly useful to other queries, and ideally never updated to avoid additional cost and index bloat. The primary key column id
is the natural candidate in absence of other useful applications:
CREATE INDEX closed_index ON tickets (id) WHERE closed IS NULL;
Or, if id
is not useful, consider a constant instead:
CREATE INDEX closed_index ON tickets ((1)) WHERE closed IS NULL;
This makes the actual index column useless like in other dismissed variants - but it avoids all additional costs and dependencies. Related:
What I might try:
Only makes sense if you don't have many other writes on the the rows in question (The added columns updated
and comments_count
make me doubt it.)
Create a partial index with id
and the other relevant columns (few & small) as index expressions, and capitalize on it with a suitable query to get index-only scans:
CREATE INDEX closed_index ON tickets (id, title, created, updated)
WHERE closed IS NULL;
VACUUM ANALYZE tickets; -- just to prove idx-only is possible
SELECT id, title, created, updated
, NULL::timestamp AS closed -- redundant, rather drop it
FROM tickets
WHERE closed IS NULL;
We don't need SELECT *
, closed IS NULL
is given by the WHERE
clause. So we can use the tiny partial index in a fast index-only scan - assuming you meet the preconditions (that's why I threw in VACUUM
to update the visibility map up there). This is the rare case where queries reading more than ~ 5 % of all rows still happily use the index (even up to including the whole table).
There seems to be redundancy in your design, simplifications should be possible.
This works since Postgres 9.6, quoting the release notes:
- Allow use of an index-only scan on a partial index when the index's
WHERE
clause references columns that are not indexed (Tomas Vondra, Kyotaro Horiguchi)For example, an index defined by
CREATE INDEX tidx_partial ON t(b) WHERE a > 0
can now be used for an index-only scan by a query that specifiesWHERE a > 0
and does not otherwise usea
. Previously this was disallowed because a is not listed as an index column.
Or the information in your question is misleading.
Related:
- PostgreSQL partial index unused when created on a table with existing data
- Unexpected Seq Scan when doing query against boolean with value NULL
- Postgres not using index when index scan is much better option
If you don't see index-only scans with this, even immediately after running VACUUM
, then high write load may be in the way and the visibility map just never reaches a state that would allow index-only scans. The manual. Or you have another problem in your DB keeping VACUUM
from doing its job. Related:
-
I don't understand is why
state_index
results in an Index Scan whereasclosed_index
results in a Bitmap Heap Scan -> Bitmap Index Scan. That is, the same number of rows are involved so shouldn't the query onclosed
also use an index scan versus a bitmap index scan?GeekJock– GeekJock2019年02月19日 05:03:48 +00:00Commented Feb 19, 2019 at 5:03 -
1Good question. But the more immediate question: Even with just 5% of qualifying rows and a bit wider rows (according to your Q update) we should not see an index scan at all. Bitmap index scan or index-only scan. (Try the updated solution for index-only scans.)Erwin Brandstetter– Erwin Brandstetter2019年02月19日 19:07:37 +00:00Commented Feb 19, 2019 at 19:07
-
I do a fair amount of writes to the table so I don't think your updated solution makes sense. But I'm still not understanding why your first solution did not result in an index scan with
closed
like the one onstate
since the same rows are at play. For example, the indexCREATE INDEX closed_index ON tickets (closed) WHERE closed IS NULL;
seems like it's the exact same thing asCREATE INDEX "state_index" ON "tickets" ("state") WHERE ((state)::text = 'open'::text));
in terms of resulting in an index scan with my example queries, no?GeekJock– GeekJock2019年02月19日 23:30:58 +00:00Commented Feb 19, 2019 at 23:30 -
@GeekJock: Really not sure why you got that index scan. And I am flat out surprised that it should be the fastest plan for the given numbers. I'd expected a bitmap index scan to be faster. But there are so many factors for performance that I could not be certain until tested. Be that as it may, I added more on the simple partial index. Consider the variant with the constant expression. Might be cheaper overall. (But the
SELECT
query should be just as fast.)Erwin Brandstetter– Erwin Brandstetter2019年02月20日 02:34:04 +00:00Commented Feb 20, 2019 at 2:34 -
Ok, I just tried your latest example and it DID generate an index scan. So it IS possible :)GeekJock– GeekJock2019年02月20日 06:14:48 +00:00Commented Feb 20, 2019 at 6:14
Explore related questions
See similar questions with these tags.
closed
column is necessary.state
column, as its only state is represented inclosed IS NULL / NOT NULL
, is the optimal solution. One more detail: the 2nd query of your addition is actually faster (51 ms >> 54 ms). So the problem to address does not seem to be there to begin with. Still wondering where that index scan came from. And why the 2nd query hasrows=51537
while the 1st hasrows=51533
. More inconsistencies?