I have a whole lot of tables that look vaguely like this:
CREATE TABLE table1(id INTEGER PRIMARY KEY, t1c1 INTEGER, t1c2 INTEGER);
CREATE TABLE table2(id INTEGER PRIMARY KEY, t1 INTEGER REFERENCES table1(id), t2c1 INTEGER);
And I do a whole lot of joins where I'm trying to filter on the joined-in table to get stuff from the first table, like this:
SELECT t1c1
FROM table1
JOIN table2 ON table2.t1 = table1.id
WHERE t2c1 = 42;
When I go to write indexes for a table I'd look at the columns that get used in the WHERE clause and build out indexes to satisfy them. So for this query I'd wind up writing an index like this:
CREATE INDEX ON table2 (t2c1);
And this index is at least eligible for use in that query.
My question is that if I write an index like this:
CREATE INDEX ON table2 (t2c1, t1);
Will the index be used as a covering index to help the JOIN in the above query? Should I change my index writing strategy to cover foreign key columns?
-
3One question per question, please. Move "Scenario 2" to a separate question. You can always link to the other to provide more context.Erwin Brandstetter– Erwin Brandstetter2017年11月05日 14:13:43 +00:00Commented Nov 5, 2017 at 14:13
-
dba.stackexchange.com/questions/190156/…ldrg– ldrg2017年11月05日 16:43:07 +00:00Commented Nov 5, 2017 at 16:43
-
I don't think it will ever use the index you've suggested in the fashion you've mentioned.Evan Carroll– Evan Carroll2017年11月05日 21:39:05 +00:00Commented Nov 5, 2017 at 21:39
2 Answers 2
Will the index be used as a covering index to help the JOIN in the above query?
It depends. Postgres has "index-only" scans as index access method, there are no "covering indexes" per se - up to Postgres 10.
Starting with Postgres 11 true covering indexes with INCLUDE
columns are available. Blog entry by Michael Paquier introducing the feature:
Related answer with code example:
That said, the index CREATE INDEX ON table2 (t2c1, t1);
makes perfect sense for the query you demonstrate. It can be used for an index-only scan if additional preconditions are met, or it can be used in a bitmap index scan or a plain index scan. Related:
JOIN
conditions and WHERE
conditions are almost completely equivalent in Postgres. They certainly can use indexes in the same way. You can rewrite your query:
SELECT t1.t1c1
FROM table1 t1
JOIN table2 t2 ON t2.t1 = t1.id
WHERE t2.t2c1 = 42;
With this equivalent:
SELECT t1.t1c1
FROM table1 t1 CROSS JOIN table2 t2
WHERE t2.t1 = t1.id
AND t2.t2c1 = 42;
The first form is obviously preferable, though. Easier to read.
Why "almost" equivalent? (Makes no difference for the simple query at hand.)
Related:
-
may i ask a question please? if the "equivalent query(CROSS JOIN)" is used does that means the composite index created will be waste(CREATE INDEX ON table2 (t2c1, t1))? because in that WHERE clause is(t1, t2c1) hence both queries are not equivalent?JSON4Live– JSON4Live2020年06月17日 05:46:19 +00:00Commented Jun 17, 2020 at 5:46
-
@JSON4Live: Not entirely sure what you are asking. Both queries are 100 % equivalent. Postgres will generate the same query plan, and the index on
table2(t2c1, t1)
can be used either way. I clarified a bit and added more links.Erwin Brandstetter– Erwin Brandstetter2020年06月19日 01:00:42 +00:00Commented Jun 19, 2020 at 1:00 -
When joining tables only
Nested Loop
join strategy can use indices to make JOIN faster. BothHash Join
andMerge Join
cannot use indexes - the most efficient way to speed up JOIN in these cases is to decrease hash table size by adding additional WHERE clauses and selecting few columns (in first case) and pre-sort data in the second case.pensnarik– pensnarik2020年07月23日 07:25:18 +00:00Commented Jul 23, 2020 at 7:25
Will the index be used as a covering index to help the JOIN in the above query? Should I change my index writing strategy to cover foreign key columns?
Not likely in the above query. This is a deceiving complex problem with the results based on the estimates and selectivity of the two conditions,
- table2.t1 = table1.id
- t2c1 = 42
Essentially, you want to throw both of the environments (row counts) to make it so both conditions have more or less selectivity. And if you get a nested-loop, you want to increase the raw amount until that's no longer the most viable join method.
CREATE TABLE table1(
id INTEGER PRIMARY KEY,
t1c1 INTEGER,
t1c2 INTEGER
);
INSERT INTO table1(id, t1c1, t1c2)
SELECT x,x,x FROM generate_series(1,1000)
AS gs(x);
CREATE TABLE table2(
id INTEGER PRIMARY KEY,
t1 INTEGER REFERENCES table1(id),
t2c1 INTEGER
);
INSERT INTO table2(id, t1, t2c1)
SELECT x,1+x%1000,x%50 FROM generate_series(1,1e6)
AS gs(x);
EXPLAIN ANALYZE
SELECT t1c1
FROM table1
JOIN table2 ON table2.t1 = table1.id
WHERE t2c1 = 42;
Now check the plan.
Now create the compound index,
CREATE INDEX ON table2 (t2c1, t1);
VACUUM FULL ANALYZE table1;
VACUUM FULL ANALYZE table2;
And check the plan again,
EXPLAIN ANALYZE
SELECT t1c1
FROM table1
JOIN table2 ON table2.t1 = table1.id
WHERE t2c1 = 42;
You can drop the keys and such to find which form it prefers
CREATE INDEX ON table2 (t1, t2c1);
or
CREATE INDEX ON table2 (t2c1, t1);
Ultimately though this is a lot of work, I suggest starting off with
CREATE INDEX ON table2 (t1);
CREATE INDEX ON table2 (t2c1);
And optimizing only if that isn't sufficient.
You can also disable specific planner options to see if another plan really is faster or slower, and then look to fixing that but that can also be a lot of work.