0

I have a nested query and I am not sure if the index I've chosen fits it's needs. Currently I am not happy with it's performance. The table contains > 2 Mio rows counting. I am working with Oracle.

My query:

SELECT
 *
FROM
 mytable where groupid IN ( 
 SELECT
 groupid
 FROM
 mytable where contractid IN (:contractids:) or predecessorContractId in (:contractIds:) 
);
  • contractId: lots of different exist
  • groupId: connected rows (predecessorContractId matches contractId) share the same groupId to be able to select contract chains more efficient. Also lots exist.

My index:

CREATE INDEX "MyIndex" ON "MyTable" ("contractId", "predecessorContractId", "groupId");

Is there a way to improve my index? Maybe also my query?

asked Oct 15, 2021 at 8:38
2
  • Please include RDBMS name like sql server or mysql Commented Oct 15, 2021 at 8:48
  • Please post complete table definitions, including PK and FKs. "groupId: connected rows (predecessorContractId matches contractId) share the same groupId to be able to select contract chains more efficient" -This is probably not true and the root of your problem. Commented Oct 15, 2021 at 13:13

2 Answers 2

2

If I get this right,

You start with a set of contract id's to collect the set of group id's that correspond to them. Then you want to collect all the rows that have a group id that is in that set.

By definition, this means the DBMS must do a table scan. Your best chances are to create a second index in which group id is the first attribute (e.g. group id, contract id). The system then has the option to go fetch "all the rows that have such a group id" using that second index (after eliminating the duplicate group id's that were found in step 1).

In order to serve the predicate on predecessorContractId, you'd also need a third index where predecessorContractId is the first attribute.

Paul White
95.4k30 gold badges440 silver badges689 bronze badges
answered Oct 15, 2021 at 8:49
2
  1. SELECT * is generally bad even if you really need all or most of the columns, it is even worse if you only need some small number that could be covered by the index
  2. Depending on the optimization the subquery itself might be evaluated many times - once per main table row. But it might be optimized by joining or by materializing too. Execution plan can show you.
  3. The subquery contains OR and that cannot use indexes efficiently (it can do index merge, but not with your index only)

For index merge two indexes could work together

  • (contractId, groupId)
  • (predecessorContractId, groupId) (or maybe their reverses, depending on the actual plan)

Better imho would be to get rid of the OR - do two conditions with two separate subqueries instead, use EXISTS instead of IN and switch the indexes to have groupId first. That way each subquery will have specific groupId and will only check the second column in the index fast.

Another way is to get rid of the subquery entirely, you can JOIN the table to itself (but possibly the optimizer does that for you already?)

answered Oct 15, 2021 at 8:49

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.