In the MySQL doc section on avoiding full table scans, one of the cases in which MySQL will use a full table scan is described like so:
You are using a key with low cardinality (many rows match the key value) through another column. In this case, MySQL assumes that by using the key it is likely to perform many key lookups and that a table scan would be faster.
I am having trouble understanding this.
For a starter, I'm not sure if I should parse the phrase as "using a key ... through another column" or "low cardinality ... through another column." Neither interpretation seems clear.
I can get the general idea that if I'm selecting a huge proportion of the table, like 75% or whatever, then the index will be slower to use (because of the dives to get the rows) than just reading the whole table. But I don't get what "through another column" has to do with it.
Can anyone explain this sentence?
-
1The cutoff by the Optimizer is much less than 75%; it is about 20%.Rick James– Rick James2021年10月07日 21:42:07 +00:00Commented Oct 7, 2021 at 21:42
1 Answer 1
The statement you quote is in contrast with the previous bullet point, which says:
You are comparing indexed columns with constant values and MySQL has calculated (based on the index tree) that the constants cover too large a part of the table and that a table scan would be faster.
In other words, the first case (comparing a key column with a constant, like ...where last_name = 'Smith'
) allows the optimizer to use the value distribution (histogram) of the key column to estimate the predicate selectivity.
In the second case (comparing a key column with another column, like ...where last_name in (select last_name from some_other_table)
) doesn't give the optimizer enough information to use the histogram, so it simply uses the key cardinality to make the decision.
I agree that the phrase "using a key ... through another column" sounds a bit awkward; "comparing a key ... to another column" would have been clearer.
-
But "many rows match the key value" makes it sound like it's a specific single value. The way you're saying it, that specific value isn't available. So what is the "key cardinality" in that case?Wildcard– Wildcard2021年10月07日 23:59:19 +00:00Commented Oct 7, 2021 at 23:59
-
1Many rows match every specific key value == key has relatively few distinct values == low cardinality.mustaccio– mustaccio2021年10月08日 01:45:49 +00:00Commented Oct 8, 2021 at 1:45