I recently started working with a legacy Rails application. Trying to go through with some of the queries used to get analytics related data. Some table contains around 13M records. Running count query against these table itself takes more than a minute. The query against which I am trying to run explain and analyze is taking more than 30 min.
Query
SELECT
portfolio_id,
min(order_trigger_date) as order_trigger_date
FROM
`user_orders`
INNER JOIN
`user_portfolios`
ON `user_portfolios`.`id` = `user_orders`.`portfolio_id`
WHERE
`user_orders`.`state` IN
(
'executed_success',
'executed_failure',
'placed_failure'
)
AND `user_orders`.`type` = 'buy'
AND `user_orders`.`order_trigger_date` BETWEEN '1969-12-31 18:30:00' AND '2023-12-15 18:29:59'
AND `user_portfolios`.`state` = 'active'
GROUP BY
`user_orders`.`portfolio_id`
We have proper indexes in both the tables
user_order
- state
- type
- order_trigger_date
- portfolio_id
user_portfolios
- state
Below is the explain result
+----+-------------+-----------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+---------+---------------------------------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+---------+---------------------------------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | user_orders | NULL | index_merge | index_user_orders_on_state,index_user_orders_on_type,index_user_orders_on_portfolio_id,index_order_trigger_date_on_admin_user_orders | index_user_orders_on_type,index_order_trigger_date_on_admin_user_orders | 513,6 | NULL | 3100629 | 66.09 | Using intersect(index_user_orders_on_type,index_order_trigger_date_on_admin_user_orders); Using where; Using temporary; Using filesort |
| 1 | SIMPLE | user_portfolios | NULL | eq_ref | PRIMARY,index_state_on_user_portfolios | PRIMARY | 8 | mydb.user_orders.portfolio_id | 1 | 50.0 | Using where |
+----+-------------+-----------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+---------+---------------------------------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------+
Can anyone suggest me any ways to optimize the above query? What are the alternative methods/ways I can use to fetch this data in faster manner?
Be
2 Answers 2
We have proper indexes in both the tables
In my experience I have never seen MySQL using more than one index per table , so you need a multicolumn index in this case.
Add the following indexes, try again the query and see the execution plan.
user_orders
alter table user_orders
add index pid_tp_ortridt_st (portfolio_id,type,order_trigger_date,state) ;
user_portfolios
alter table user_portfolios
add index id_st (id,state) ;
It's a nasty. You are filtering on two tables, and the main table needs a two-dimensional index (which does not exist).
I suggest these as possibly helping:
user_orders: INDEX(type, state, order_trigger_date, portfolio_id)
user_orders: INDEX(type, order_trigger_date, portfolio_id, state)
user_portfolios: INDEX(state, id)
You could try this reformulation, too:
SELECT uo.portfolio_id, min(uo.order_trigger_date) as order_trigger_date
FROM `user_orders` AS uo `user_portfolios` AS up ON up.`id` = uo.`portfolio_id`
WHERE uo.`state` IN ( 'executed_success',
'executed_failure',
'placed_failure'
)
AND uo.`type` = 'buy'
AND uo.`order_trigger_date` BETWEEN '1969-12-31 18:30:00' AND '2023-12-15 18:29:59'
AND up.`state` = 'active'
AND EXISTS ( SELECT 1
FROM `user_portfolios` AS up
WHERE up.`id` = uo.`portfolio_id`
)
GROUP BY uo.`portfolio_id`
The following won't change speed, but makes more sense:
AND `order_trigger_date` BETWEEN '1969-12-31 18:30:00'
AND '2023-12-15 18:29:59'
-->
AND `order_trigger_date` < '2023-12-15 18:29:59'
That looks like it is "all dates", in which case remove that part of the WHERE clause to make any SELECT variant run faster. If date range comes from user input, build the WHERE clause on the fly to avoid kludges like that BETWEEN.
Explore related questions
See similar questions with these tags.
EXPLAIN ANALYZE
say?