Should I create an index for non key columns?

Question 1

I have a table in a PostGreSql database defined as following:

CREATE TABLE public."MATCH"(
 "ITEM_A_ID" bigint DEFAULT 0,
 "ITEM_B_ID" bigint DEFAULT 0,
 "OWNER_A_ID" bigint DEFAULT 0,
 "OWNER_B_ID" bigint DEFAULT 0,
 "OTHER_DATA" varchar(100) NOT NULL DEFAULT ''
 CONSTRAINT "MATCH_PK" PRIMARY KEY ("ITEM_A_ID","ITEM_B_ID")
);

It will contain a lot of rows. There will be a lot of queries like the following performed on this table:

SELECT * FROM "MATCH" WHERE "OWNER_A_ID" = owner_a_id;
SELECT * FROM "MATCH" WHERE "OWNER_B_ID" = owner_b_id;

I was thinking about creating indexes on OWNER_A_ID and OWNER_B_ID, since these columns are not keys. Is this a good idea, and if yes, how should I create these? Should I create one index with both columns? Should I create two indexes? Should I include other columns?

Question 2

The selection of enough indexes is often difficult. In your case it should be useful to create two indexes.

You should only create one index with both columns if your query always include the first column as a condition:

SELECT * FROM "MATCH" WHERE "OWNER_A_ID" = owner_a_id [AND "OWNER_B_ID" = owner_b_id]

The whole B-Tree is built upon the order of columns in the index! You can't fully use a multi-column index on a, b on the following queries:

SELECT * FROM "MATCH" WHERE "OWNER_A_ID" = owner_a_id OR "OWNER_B_ID" = owner_b_id
SELECT * FROM "MATCH" WHERE "OWNER_A_ID" = "OWNER_B_ID"

If you're only using equality checks you might consider a hash index. But postgresql has some disadvantages you should check first.

On other dbms you should consider adding additional columns in the index as data. This would be useful if you query these specific columns and not * because the dbms wouldn't need to feed the data from the table after using the index.

An important factor: indices fragment over the time (unless you aren't performing any insert/update/delete on the table). Please check whether your dba has some optimization operations installed.

Pleae check the documentation for additional options like FILLFACTOR or partial indexes: http://www.postgresql.org/docs/9.3/static/sql-createindex.html

Question 3

Hash indexes are discouraged in Postgresql. postgresql.org/docs/9.1/static/indexes-types.html

Question 4

Yes, I know - I should have used the term discouraged instead of disadvantage to make it clearer.

Question 5

An update from 2022, hash indices are not discouraged in Postgres 14: postgresql.org/docs/14/indexes-types.html There is also this post that discusses Hash indices hakibenita.com/postgresql-hash-index.

Question 6

Please be more specific: "It will contain a lot of rows". How many? Millions, thousands or billions. "Is it a good idea?" It depends.

If your queries are like the ones you mention, you should create two b-tree indexes, one for each field. Instructions are here: http://www.postgresql.org/docs/9.1/static/sql-createindex.html

You should only create one index for both fields, ONLY IF all your queries are like:

SELECT * FROM "MATCH" WHERE "OWNER_A_ID" = owner_a_id AND "OWNER_B_ID"=owner_b_id;

This index will also work for queries like:

SELECT * FROM "MATCH" WHERE "OWNER_A_ID" = owner_a_id;

but NOT for

SELECT * FROM "MATCH" WHERE "OWNER_B_ID" = owner_b_id;

Jens D. Jens D. 1,2291 gold badge9 silver badges16 bronze badges · Accepted Answer · 2014-03-08 16:25:56Z

The selection of enough indexes is often difficult. In your case it should be useful to create two indexes.

You should only create one index with both columns if your query always include the first column as a condition:

SELECT * FROM "MATCH" WHERE "OWNER_A_ID" = owner_a_id [AND "OWNER_B_ID" = owner_b_id]

The whole B-Tree is built upon the order of columns in the index! You can't fully use a multi-column index on a, b on the following queries:

SELECT * FROM "MATCH" WHERE "OWNER_A_ID" = owner_a_id OR "OWNER_B_ID" = owner_b_id
SELECT * FROM "MATCH" WHERE "OWNER_A_ID" = "OWNER_B_ID"

If you're only using equality checks you might consider a hash index. But postgresql has some disadvantages you should check first.

On other dbms you should consider adding additional columns in the index as data. This would be useful if you query these specific columns and not * because the dbms wouldn't need to feed the data from the table after using the index.

An important factor: indices fragment over the time (unless you aren't performing any insert/update/delete on the table). Please check whether your dba has some optimization operations installed.

Pleae check the documentation for additional options like FILLFACTOR or partial indexes: http://www.postgresql.org/docs/9.3/static/sql-createindex.html

Hash indexes are discouraged in Postgresql. postgresql.org/docs/9.1/static/indexes-types.html
Yes, I know - I should have used the term discouraged instead of disadvantage to make it clearer.
An update from 2022, hash indices are not discouraged in Postgres 14: postgresql.org/docs/14/indexes-types.html There is also this post that discusses Hash indices hakibenita.com/postgresql-hash-index.

Stack Exchange Network

Should I create an index for non key columns?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Should I create an index for non key columns?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions