Do covering indexes in PostgreSQL help JOIN columns?

Question 1

I have a whole lot of tables that look vaguely like this:

CREATE TABLE table1(id INTEGER PRIMARY KEY, t1c1 INTEGER, t1c2 INTEGER);
CREATE TABLE table2(id INTEGER PRIMARY KEY, t1 INTEGER REFERENCES table1(id), t2c1 INTEGER);

And I do a whole lot of joins where I'm trying to filter on the joined-in table to get stuff from the first table, like this:

SELECT t1c1
FROM table1
JOIN table2 ON table2.t1 = table1.id
WHERE t2c1 = 42;

When I go to write indexes for a table I'd look at the columns that get used in the WHERE clause and build out indexes to satisfy them. So for this query I'd wind up writing an index like this:

CREATE INDEX ON table2 (t2c1);

And this index is at least eligible for use in that query.

My question is that if I write an index like this:

CREATE INDEX ON table2 (t2c1, t1);

Will the index be used as a covering index to help the JOIN in the above query? Should I change my index writing strategy to cover foreign key columns?

Question 2

One question per question, please. Move "Scenario 2" to a separate question. You can always link to the other to provide more context.

Question 3

dba.stackexchange.com/questions/190156/…

Question 4

I don't think it will ever use the index you've suggested in the fashion you've mentioned.

Question 5

Will the index be used as a covering index to help the JOIN in the above query?

It depends. Postgres has "index-only" scans as index access method, there are no "covering indexes" per se - up to Postgres 10.

Starting with Postgres 11 true covering indexes with INCLUDE columns are available. Blog entry by Michael Paquier introducing the feature:

https://paquier.xyz/postgresql-2/postgres-11-covering-indexes/

Related answer with code example:

Does a query with a primary key and foreign keys run faster than a query with just primary keys?

That said, the index CREATE INDEX ON table2 (t2c1, t1); makes perfect sense for the query you demonstrate. It can be used for an index-only scan if additional preconditions are met, or it can be used in a bitmap index scan or a plain index scan. Related:

JOIN conditions and WHERE conditions are almost completely equivalent in Postgres. They certainly can use indexes in the same way. You can rewrite your query:

SELECT t1.t1c1
FROM table1 t1
JOIN table2 t2 ON t2.t1 = t1.id
WHERE t2.t2c1 = 42;

With this equivalent:

SELECT t1.t1c1
FROM table1 t1 CROSS JOIN table2 t2
WHERE t2.t1 = t1.id
AND t2.t2c1 = 42;

The first form is obviously preferable, though. Easier to read.

Why "almost" equivalent? (Makes no difference for the simple query at hand.)

Why does this implicit join get planned differently than an explicit join?

Question 6

may i ask a question please? if the "equivalent query(CROSS JOIN)" is used does that means the composite index created will be waste(CREATE INDEX ON table2 (t2c1, t1))? because in that WHERE clause is(t1, t2c1) hence both queries are not equivalent?

Question 7

@JSON4Live: Not entirely sure what you are asking. Both queries are 100 % equivalent. Postgres will generate the same query plan, and the index on table2(t2c1, t1) can be used either way. I clarified a bit and added more links.

Question 8

When joining tables only Nested Loop join strategy can use indices to make JOIN faster. Both Hash Join and Merge Join cannot use indexes - the most efficient way to speed up JOIN in these cases is to decrease hash table size by adding additional WHERE clauses and selecting few columns (in first case) and pre-sort data in the second case.

Question 9

Will the index be used as a covering index to help the JOIN in the above query? Should I change my index writing strategy to cover foreign key columns?

Not likely in the above query. This is a deceiving complex problem with the results based on the estimates and selectivity of the two conditions,

table2.t1 = table1.id
t2c1 = 42

Essentially, you want to throw both of the environments (row counts) to make it so both conditions have more or less selectivity. And if you get a nested-loop, you want to increase the raw amount until that's no longer the most viable join method.

CREATE TABLE table1(
 id INTEGER PRIMARY KEY,
 t1c1 INTEGER,
 t1c2 INTEGER
);
INSERT INTO table1(id, t1c1, t1c2)
 SELECT x,x,x FROM generate_series(1,1000)
 AS gs(x);
CREATE TABLE table2(
 id INTEGER PRIMARY KEY,
 t1 INTEGER REFERENCES table1(id),
 t2c1 INTEGER
);
INSERT INTO table2(id, t1, t2c1)
SELECT x,1+x%1000,x%50 FROM generate_series(1,1e6)
 AS gs(x);
EXPLAIN ANALYZE
 SELECT t1c1
 FROM table1
 JOIN table2 ON table2.t1 = table1.id
 WHERE t2c1 = 42;

Now check the plan.

Now create the compound index,

CREATE INDEX ON table2 (t2c1, t1);
VACUUM FULL ANALYZE table1;
VACUUM FULL ANALYZE table2;

And check the plan again,

EXPLAIN ANALYZE
 SELECT t1c1
 FROM table1
 JOIN table2 ON table2.t1 = table1.id
 WHERE t2c1 = 42;

You can drop the keys and such to find which form it prefers

CREATE INDEX ON table2 (t1, t2c1);

or

CREATE INDEX ON table2 (t2c1, t1);

Ultimately though this is a lot of work, I suggest starting off with

CREATE INDEX ON table2 (t1);
CREATE INDEX ON table2 (t2c1);

And optimizing only if that isn't sufficient.

You can also disable specific planner options to see if another plan really is faster or slower, and then look to fixing that but that can also be a lot of work.

score 23 · Accepted Answer · 2017-11-05 14:24:43Z

Will the index be used as a covering index to help the JOIN in the above query?

It depends. Postgres has "index-only" scans as index access method, there are no "covering indexes" per se - up to Postgres 10.

Starting with Postgres 11 true covering indexes with INCLUDE columns are available. Blog entry by Michael Paquier introducing the feature:

https://paquier.xyz/postgresql-2/postgres-11-covering-indexes/

Related answer with code example:

Does a query with a primary key and foreign keys run faster than a query with just primary keys?

That said, the index CREATE INDEX ON table2 (t2c1, t1); makes perfect sense for the query you demonstrate. It can be used for an index-only scan if additional preconditions are met, or it can be used in a bitmap index scan or a plain index scan. Related:

JOIN conditions and WHERE conditions are almost completely equivalent in Postgres. They certainly can use indexes in the same way. You can rewrite your query:

SELECT t1.t1c1
FROM table1 t1
JOIN table2 t2 ON t2.t1 = t1.id
WHERE t2.t2c1 = 42;

With this equivalent:

SELECT t1.t1c1
FROM table1 t1 CROSS JOIN table2 t2
WHERE t2.t1 = t1.id
AND t2.t2c1 = 42;

The first form is obviously preferable, though. Easier to read.

Why "almost" equivalent? (Makes no difference for the simple query at hand.)

Why does this implicit join get planned differently than an explicit join?

may i ask a question please? if the "equivalent query(CROSS JOIN)" is used does that means the composite index created will be waste(CREATE INDEX ON table2 (t2c1, t1))? because in that WHERE clause is(t1, t2c1) hence both queries are not equivalent?
@JSON4Live: Not entirely sure what you are asking. Both queries are 100 % equivalent. Postgres will generate the same query plan, and the index on table2(t2c1, t1) can be used either way. I clarified a bit and added more links.
When joining tables only Nested Loop join strategy can use indices to make JOIN faster. Both Hash Join and Merge Join cannot use indexes - the most efficient way to speed up JOIN in these cases is to decrease hash table size by adding additional WHERE clauses and selecting few columns (in first case) and pre-sort data in the second case.

Stack Exchange Network

Do covering indexes in PostgreSQL help JOIN columns?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions