1

In the query below, from and tid are indexes of the replies table.

SELECT * FROM `replies`
WHERE `from`="<userId>"
OR `tid` IN (SELECT `tid` FROM `posts` WHERE `from`="<userId>")

By using "OR", it seems that it does a full table scan (~3 million rows). The EXPLAIN says that a possible key would be from, but then it doesn't use any.

However, in the query below, frid_lt and frid_gt are indexed. The two columns are in a complex index (frid_lt, frid_gt), but frid_gt has also its own index.

SELECT `mid` FROM `messages`
WHERE `frid_lt`="<userId>" OR `frid_gt`="<userId>"

And this query DOES use two indexes. The EXPLAIN says "index_merge" and "Using sort_union(frid_lt,frid_gt); Using where".

Why does the first query not use an index merge?
Is there any improvement I can make to make the engine use an index merge as well?

asked Oct 23, 2016 at 18:18
5
  • Is from on posts indexed as well? Commented Oct 23, 2016 at 22:54
  • 1
    You may want to add CREATE TABLE statements and exact output of EXPLAIN to the question as well. Commented Oct 23, 2016 at 22:57
  • 1
    Imagine you were allowed to make just one forward-only pass of a phone book, and someone asked you to find all the people with the last name Smith OR the first name Yvette. Sure you could easily seek to the Smiths, but that doesn't help you because you can't go backward and start finding all the Yvettes with last name starting with A, B, etc. Sometimes an index (seek) isn't the most efficient way to solve a query that has multiple filter criteria (or returns too many rows, or too many columns that aren't in the index, or ...) Commented Oct 24, 2016 at 2:19
  • @wolfgangwalther Yes, from is indexed (I said in the post). @AaronBertrand I do understand that. However, by using the "last name" index, at least, I saved time searching for Smith. I guess the engine could be improved to do two lookups on the table using each index, rather than a FULL table scan using no index. I would be happy to tell that to engine with FORCE INDEXES or similar, to help it decide. Commented Oct 24, 2016 at 6:39
  • (phone book, continued) While scanning for Yvette, it could trivially check for Smith, thereby doing it all in a single table scan. Note: Fetching a row costs a lot more than checking for whether to keep the row. Commented Oct 24, 2016 at 18:37

1 Answer 1

1

OR does not optimize well. A common workaround is to use UNION:

( SELECT * FROM replies WHERE `from` = "..." )
UNION ALL -- or UNION DISTINCT if you know there are no dups
( SELECT r.* FROM replies AS r
 JOIN posts AS p ON p.tid = r.tid
 WHERE p.from = "..." )

Notice that I also avoided the usually-inefficient IN ( SELECT ... )

For further performance, have these indexes:

replies: INDEX(`from`)
posts: INDEX(`from`, tid) -- in this order
replies: INDEX(tid)

(And note that the PRIMARY KEY is an index, so don't add a redundant index.)

In your second example, the "index merge" that you experienced may or may not be faster than a UNION.

Oh, it's an UPDATE

To optimize UPDATE, do two separate UPDATEs (no UNION, no OR). One straightforwardly checks from. The other is a "multi-table UPDATE" (see the manual) similar to the second select above.

answered Oct 24, 2016 at 2:19
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.