3

Hey there, this is my first question so please be patient if I miss to add something in here.

My question is why the OR in the WHERE clause causes the query to be executed the way it is (which is poorly). There is some detailed information about my scenario below.

Server Version is ArchLinux MariaDB 10.1.35-1


While I was trying to improve the wall time of a query I noticed that the optimizer does some bad optimizing for a relatively simple query. In this simplified scenario I have two tables, person and contract. A contract has one or two affiliates (foreign key to person).

describe person;
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| firstname | varchar(45) | YES | | NULL | |
| lastname | varchar(45) | YES | MUL | NULL | |
+-----------+-------------+------+-----+---------+----------------+
describe contract;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| state | varchar(45) | YES | MUL | NULL | |
| affiliate1 | int(11) | YES | MUL | NULL | |
| affiliate2 | int(11) | YES | | NULL | |
+------------+-------------+------+-----+---------+----------------+

There are indexes on (state), (state,affiliate1), (state,affiliate2), (state,affiliate1,affiliate2) and (lastname).


The query I was working on should summarize how many contracts are in which state, filtered by the lastname of either affiliate. For example I execute a query for the lastname 'Doe' I get something like

+-------------+----------+
| state | count(*) |
+-------------+----------+
| CREATED | 5 |
| IN PROGRESS | 1 |
| ACCEPTED | 2 |
+-------------+----------+

The initial query looked like this

SELECT state,count(*) 
FROM contract 
INNER JOIN person a1 ON a1.id=contract.affiliate1 
LEFT OUTER JOIN person a2 ON a2.id=contract.affiliate2
WHERE a1.lastname=? OR a2.lastname=?
GROUP BY state;

I tested this query on a database with 1 million contracts, each having two distinct affiliates (so there are 2 million persons). This query took about 3 seconds to execute. If a left out one of a1.lastname=? or a2.lastname=? the query takes about 0.01 second. In the end I came up with using a UNION to produce my result in a single query (which is very fast, too).

Here is the execution plan for the slow query I got from the slow query log:

# Query_time: 2.449397 Lock_time: 0.001430 Rows_sent: 1 Rows_examined: 3000000
# Rows_affected: 0
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
#
# explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
# explain: 1 SIMPLE contract index UK_contract_affiliate1,FK_contract_affiliate2 id_a1_a2 13 NULL 902857 1000000.00 100.00 100.00 Using where; Using index
# explain: 1 SIMPLE a1 eq_ref PRIMARY,lastname PRIMARY 4 contract.affiliate1 1 1.00 100.00 100.00 
# explain: 1 SIMPLE a2 eq_ref PRIMARY,lastname PRIMARY 4 contract.affiliate2 1 1.00 100.00 0.00 Using where
#

EDIT: As requested, the EXPLAIN output

EXPLAIN SELECT contract.state
 FROM contract
 INNER JOIN person kn1 ON contract.id_affiliate1=affiliate1.id
 WHERE affiliate1.lastname='ABC' 
+------+-------------+----------+--------+------------------------------------------------------------+---------------+---------+----------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------+--------+------------------------------------------------------------+---------------+---------+----------------------+------+--------------------------+
| 1 | SIMPLE | kn1 | ref | PRIMARY,lastname | lastname | 272 | const | 13 | Using where; Using index |
| 1 | SIMPLE | contract | eq_ref | UK_contract_affiliate1,state,affiliate1_type_status,affiliate2_type_status | UK_contract_affiliate1 | 4 | affiliate1.id | 1 | Using where |
+------+-------------+----------+--------+------------------------------------------------------------+---------------+---------+----------------------+------+--------------------------+
EXPLAIN SELECT contract.state
 FROM contract
 INNER JOIN person a1 ON contract.affiliate1=a1.id a1.lastname='ABC'
 UNION ALL 
SELECT contract.state
 FROM contract
 INNER JOIN person a2 ON contract.affiliate2=a2.id a2.lastname='ABC'
+------+-------------+------------+--------+----------------------------------------------------------------+------------------------+---------+----------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+--------+----------------------------------------------------------------+------------------------+---------+----------------------+------+--------------------------+
| 1 | PRIMARY | affiliate1 | ref | PRIMARY,lastname | lastname | 272 | const | 1 | Using where; Using index |
| 1 | PRIMARY | contract | eq_ref | UK_contract_affiliate1,state,affiliate1_state,affiliate2_state | UK_contract_affiliate1 | 4 | affiliate1.id | 1 | Using where |
| 2 | UNION | affiliate2 | ref | PRIMARY,lastname | lastname | 272 | const | 1 | Using where; Using index |
| 2 | UNION | contract | ref | FK_contract_affiliate2,state,affiliate1_state,affiliate2_state | FK_contract_affiliate2 | 5 | affiliate2.id | 1 | Using where |
+------+-------------+------------+--------+----------------------------------------------------------------+------------------------+---------+----------------------+------+--------------------------+
3
  • Your indexes should start with affiliate[12] and end with state because JOIN and WHERE criteria should come before GROUP BY. An index where its whole content is the prefix of another index is a duplicate. Have you played with optimizer_switch? What server version do you have?For comparisoin can you show the query/explain for single and UNION? Commented Sep 8, 2018 at 22:48
  • Thanks for the hint about my indexes. I didn't know the optimizer_switch bit I will give it a try. My server version is ArchLinux MariaDB 10.1.35-1. The EXPLAIN outputs are added to the question. Commented Sep 10, 2018 at 7:45
  • The UNION ALL query is not equivalent to the first one (with two joins) Commented Sep 10, 2018 at 11:53

1 Answer 1

1

OR does not optimize well because it would involve searching for all possible rows with one of the things, then check for the other, then combine the two lists.

Your original query is even worse since the things being OR'd together come from different tables. MySQL really punts on that. So it reads both tables entirely.

The UNION does what the Optimizer can't do; that is focus on one part of the OR at a time.

See my index cookbook

answered Oct 4, 2018 at 3:51

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.