(For this question, I am using AWS/Aurora MySQL with a reasonably-spec'd RDS instance)
Consider the following schema:
Table T:
col0: the usual autoincrement primary key
col1: varchar
col2: varchar
col3: varchar
col4...N: various data
Consider that there is a unique index on:
<col1, col2, col3>
And a non-unique index on:
<col1, col2>
And consider the following query:
SELECT * FROM T
WHERE
(col1 = 'val1' AND col2 = 'id1') OR
(col1 = 'val2' AND col2 = 'id2') OR
...
(col1 = 'valN' AND col2 = 'idN');
I would (perhaps naively) expected MySQL to figure out that each element of the OR set matched the (non-unique) index, and performed the query in the way it would have if I had said:
WHERE col0 in (v1, v2, ... , vN)
But it doesn't seem to do that: the timing for these two queries is WAY OFF, on the order of 10x slower for the "or of ands" query. EVEN WITH the secondary key lookup, and the fact that it's a string column lookup, 10x seems a bit severe. Note that EXPLAIN claims to be using the correct/expected index whether I specify (col1, col2) or (col1, col2, col3)
Please note also that:
SELECT * from T
WHERE
col1 in (list1)
AND
col2 in (list2);
Is also slow when there are a lot of different values in list1 and list2. Doing an "and" for the three columns is almost intractably slow.
Perhaps not surprisingly, this query works better than the "or of ands" when list1 is of length 1.
1 Answer 1
With "row constructors", you might get an optimization:
WHERE (col1, col2) IN (('v1', 'id1'), ('v2', 'id2'), ...)
But... In old versions, that would work, but lead to a table scan. I can't say specifically about the version you are running.
When you have this pair of indexes:
UNIQUE(col1, col2, col3) -- (or plain INDEX)
INDEX(col1, col2)
there is no need for the latter, since the former can handle any queries that need it.
Perhaps the optimal way to write your query is
WHERE col1 in ('v1', 'v2', ...)
AND (col1, col2) IN (('v1', 'id1'), ('v2', 'id2'), ...)
With that, it will use any index starting with col1 as a crude filter, then use the other part for the rest of the filtering.
Re "convert to an in method" -- MySQL started out as a clean and mean database; it did most of what anyone needed and did it reasonably well. That was 90% of the development. We are now into the other 90% of the development -- the "long tail". Quite possibly some list somewhere includes "convert to an in method". If so, it is being prioritized along with the thousands of other rare and obscure optimizations. Feel free to file a 'feature request' at bugs.mysql.com; that is the way to add it to the list, or bump it up in priority.
-
Thank you, that answer was AWESOME. I was also unaware that MySQL handled tuples of columns "in" such a way.Mark Gerolimatos– Mark Gerolimatos2019年01月16日 22:18:19 +00:00Commented Jan 16, 2019 at 22:18
-
@MarkGerolimatos - It's been available for a long time, but due to the total lack of optimization (in the past), it is just as well that users have not noticed.Rick James– Rick James2019年01月16日 22:20:43 +00:00Commented Jan 16, 2019 at 22:20
-
Well then @rickjames (!!!!!!), it's good that I didn't know. I assume that whatever version AWS/Aurora runs does it right?Mark Gerolimatos– Mark Gerolimatos2019年01月16日 22:23:44 +00:00Commented Jan 16, 2019 at 22:23
-
I don't know for sure about Aurora. That variant of MySQL does have some noticeable improvements (Query cache, replication/backup at the block level, etc); I don't know if they did anything with "row constructor" optimization. I think MySQL 5.7 has done some improvements. I see bug fixes as far back as 5.0, so they existed at least that long ago.Rick James– Rick James2019年01月16日 22:40:46 +00:00Commented Jan 16, 2019 at 22:40
Explore related questions
See similar questions with these tags.
col1,col2and then join against it might the highest performing way.