1

After a migration from MySQL 5.7 to MySQL 8.0.34, we have a very strange behavior with a query when semijoin is on in the optimizer_switch.

1. The problematic query:
SELECT COUNT(s0_.id)
FROM stores_shoppers s0_
WHERE s0_.id NOT IN (
 SELECT s4_.id
 FROM emails_history i5_
 INNER JOIN stores_shoppers s4_ ON i5_.shopper_id = s4_.id
 WHERE i5_.rule_id IN (1517676 , 1517677)
)

In this case this is the execution plan:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE s0_ NULL index NULL stores_shoppers_store_id_idx 4 NULL 361627391 100.00 Using index
1 SIMPLE NULL eq_ref <auto_distinct_key> <auto_distinct_key> 5 remarkety_prod.s0_.id 1 100.00 Using where; Not exists
2 MATERIALIZED i5_ invalid_dates,p2022m03,... ALL emails_history_rule_id_idx NULL NULL NULL 7526556120 100.00 Using where
2 MATERIALIZED s4_ NULL eq_ref PRIMARY PRIMARY 4 remarkety_prod.i5_.shopper_id 1 100.00 Using index

The problem is that the i5_ JOIN type is ALL, no index is used in the JOIN.

2. With only one value in the IN clause:

If the IN clause in the query includes only one value:

SELECT COUNT(s0_.id)
FROM stores_shoppers s0_
WHERE s0_.id NOT IN (
 SELECT s4_.id
 FROM emails_history i5_
 INNER JOIN stores_shoppers s4_ ON i5_.shopper_id = s4_.id
 WHERE i5_.rule_id IN (1517676) -- only one value
)

The execution plan becomes:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE s0_ NULL index NULL stores_shoppers_store_id_idx 4 NULL 361632497 100.00 Using index
1 SIMPLE NULL eq_ref <auto_distinct_key> <auto_distinct_key> 5 remarkety_prod.s0_.id 1 100.00 Using where; Not exists
2 MATERIALIZED i5_ invalid_dates,p2022m03,... ref emails_history_rule_id_idx emails_history_rule_id_idx 5 const 56944 100.00 NULL
2 MATERIALIZED s4_ NULL eq_ref PRIMARY PRIMARY 4 remarkety_prod.i5_.shopper_id 1 100.00 Using index

In this case, the JOIN uses an index but the execution plan is still different from the execution plan we have in MySQL 5.7

3. Using i5_.shopper_id instead of s4_.id (which is supposed to be identical)
SELECT COUNT(s0_.id)
FROM stores_shoppers s0_
WHERE s0_.id NOT IN (
 SELECT i5_.shopper_id -- using i5_.shopper_id instead of s4_.id
 FROM emails_history i5_
 INNER JOIN stores_shoppers s4_ ON i5_.shopper_id = s4_.id
 WHERE i5_.rule_id IN (1517676 , 1517677)
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY s0_ NULL index NULL stores_shoppers_store_id_idx 4 NULL 361632516 100.00 Using where; Using index
2 SUBQUERY i5_ invalid_dates,p2022m03,... range emails_history_rule_id_idx emails_history_rule_id_idx 5 NULL 56945 100.00 Using index condition; Using where
2 SUBQUERY s4_ NULL eq_ref PRIMARY PRIMARY 4 remarkety_prod.i5_.shopper_id 1 100.00 Using index

In this case, the execution plan is the same as we had in MySQL 5.7

4. With semijoin=off
SET optimizer_switch = 'semijoin=off';
SELECT COUNT(s0_.id)
FROM stores_shoppers s0_
WHERE s0_.id NOT IN (
 SELECT s4_.id
 FROM emails_history i5_
 INNER JOIN stores_shoppers s4_ ON i5_.shopper_id = s4_.id
 WHERE i5_.rule_id IN (1517676 , 1517677)
)
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY s0_ NULL index NULL stores_shoppers_store_id_idx 4 NULL 361632550 100.00 Using where; Using index
2 SUBQUERY i5_ invalid_dates,p2022m03,... range emails_history_rule_id_idx emails_history_rule_id_idx 5 NULL 56945 100.00 Using index condition; Using where
2 SUBQUERY s4_ NULL eq_ref PRIMARY PRIMARY 4 remarkety_prod.i5_.shopper_id 1 100.00 Using index

Same execution plan as before.

Questions:

Can anyone explain to me what's going on ?
Is that a bug in the optimizer ?
Why the join doesn't use index in the first query ?

asked Oct 18, 2023 at 18:18
1
  • What indexes do you have? Did you try any EXISTS yourself? Commented Oct 18, 2023 at 23:47

1 Answer 1

0

Problem with optimizer_switch -- File a bug report at bugs.mysql.com

Recommend adding these indexes:

emails_history: INDEX(shopper_id, rule_id)
emails_history: INDEX(rule_id, shopper_id)

Some general tips:

  • COUNT(x) tests x for being NOT NULL. You probably wanted COUNT(*).
  • NOT IN ( SELECT ... ) has long been a poorly optimized construct. 5.7 got better; 8.0 tried to get it even better, but maybe not in this case.
  • IN(1517676) is optimized as = 1517676, but IN (1517676 , 1517677) does not optimized as well. Sometimes a multi-item IN is best optimized with UNION.
  • "semijoin" is probably a synonym of EXISTS(SELECT ...)

Performance thoughts

I don't understand the goal of the query, but here are some things I would try to do:

  • Rewrite the query to avoid using stores_shoppers twice.
  • Try NOT EXISTS and LEFT JOIN ... IS NULL
  • Try ( SELECT ... rule_id = 123 ) UNION ALL ( SELECT ... rule_id = 123 )
answered Oct 19, 2023 at 0:08
1
  • First of all, thank you for this clear answer. The query is a part of a much larger query generated by doctrine, I only posted this part to show the error we are encountering. That's why the goal is not really clear here. I followed your advice and rewrote the query without using stores_shoppers a second time and the problem was solved. Commented Oct 19, 2023 at 12:36

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.