Optimal index for a nested query?

Question 1

I have a nested query and I am not sure if the index I've chosen fits it's needs. Currently I am not happy with it's performance. The table contains > 2 Mio rows counting. I am working with Oracle.

My query:

SELECT
 *
FROM
 mytable where groupid IN ( 
 SELECT
 groupid
 FROM
 mytable where contractid IN (:contractids:) or predecessorContractId in (:contractIds:) 
);

contractId: lots of different exist
groupId: connected rows (predecessorContractId matches contractId) share the same groupId to be able to select contract chains more efficient. Also lots exist.

My index:

CREATE INDEX "MyIndex" ON "MyTable" ("contractId", "predecessorContractId", "groupId");

Is there a way to improve my index? Maybe also my query?

Question 2

Please include RDBMS name like sql server or mysql

Question 3

Please post complete table definitions, including PK and FKs. "groupId: connected rows (predecessorContractId matches contractId) share the same groupId to be able to select contract chains more efficient" -This is probably not true and the root of your problem.

Question 4

If I get this right,

You start with a set of contract id's to collect the set of group id's that correspond to them. Then you want to collect all the rows that have a group id that is in that set.

By definition, this means the DBMS must do a table scan. Your best chances are to create a second index in which group id is the first attribute (e.g. group id, contract id). The system then has the option to go fetch "all the rows that have such a group id" using that second index (after eliminating the duplicate group id's that were found in step 1).

In order to serve the predicate on predecessorContractId, you'd also need a third index where predecessorContractId is the first attribute.

Question 5

SELECT * is generally bad even if you really need all or most of the columns, it is even worse if you only need some small number that could be covered by the index
Depending on the optimization the subquery itself might be evaluated many times - once per main table row. But it might be optimized by joining or by materializing too. Execution plan can show you.
The subquery contains OR and that cannot use indexes efficiently (it can do index merge, but not with your index only)

For index merge two indexes could work together

(contractId, groupId)
(predecessorContractId, groupId) (or maybe their reverses, depending on the actual plan)

Better imho would be to get rid of the OR - do two conditions with two separate subqueries instead, use EXISTS instead of IN and switch the indexes to have groupId first. That way each subquery will have specific groupId and will only check the second column in the index fast.

Another way is to get rid of the subquery entirely, you can JOIN the table to itself (but possibly the optimizer does that for you already?)

Erwin Smout Erwin Smout 1,75610 silver badges12 bronze badges · Accepted Answer · 2021-10-15 08:49:23Z

If I get this right,

You start with a set of contract id's to collect the set of group id's that correspond to them. Then you want to collect all the rows that have a group id that is in that set.

By definition, this means the DBMS must do a table scan. Your best chances are to create a second index in which group id is the first attribute (e.g. group id, contract id). The system then has the option to go fetch "all the rows that have such a group id" using that second index (after eliminating the duplicate group id's that were found in step 1).

In order to serve the predicate on predecessorContractId, you'd also need a third index where predecessorContractId is the first attribute.

Stack Exchange Network

Optimal index for a nested query?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimal index for a nested query?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions