MySQL optimize query against table containing 13M records

Question 1

I recently started working with a legacy Rails application. Trying to go through with some of the queries used to get analytics related data. Some table contains around 13M records. Running count query against these table itself takes more than a minute. The query against which I am trying to run explain and analyze is taking more than 30 min.

Query

 
 SELECT
 portfolio_id,
 min(order_trigger_date) as order_trigger_date 
 FROM
 `user_orders` 
 INNER JOIN
 `user_portfolios` 
 ON `user_portfolios`.`id` = `user_orders`.`portfolio_id` 
 WHERE
 `user_orders`.`state` IN 
 (
 'executed_success',
 'executed_failure',
 'placed_failure'
 )
 AND `user_orders`.`type` = 'buy' 
 AND `user_orders`.`order_trigger_date` BETWEEN '1969-12-31 18:30:00' AND '2023-12-15 18:29:59' 
 AND `user_portfolios`.`state` = 'active' 
 GROUP BY
 `user_orders`.`portfolio_id`

We have proper indexes in both the tables

user_order
 - state
 - type
 - order_trigger_date
 - portfolio_id
user_portfolios 
 - state

Below is the explain result

+----+-------------+-----------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+---------+---------------------------------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+---------+---------------------------------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | user_orders | NULL | index_merge | index_user_orders_on_state,index_user_orders_on_type,index_user_orders_on_portfolio_id,index_order_trigger_date_on_admin_user_orders | index_user_orders_on_type,index_order_trigger_date_on_admin_user_orders | 513,6 | NULL | 3100629 | 66.09 | Using intersect(index_user_orders_on_type,index_order_trigger_date_on_admin_user_orders); Using where; Using temporary; Using filesort |
| 1 | SIMPLE | user_portfolios | NULL | eq_ref | PRIMARY,index_state_on_user_portfolios | PRIMARY | 8 | mydb.user_orders.portfolio_id | 1 | 50.0 | Using where |
+----+-------------+-----------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+---------+---------------------------------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------+

Can anyone suggest me any ways to optimize the above query? What are the alternative methods/ways I can use to fetch this data in faster manner?

Be

Question 2

What does the EXPLAIN ANALYZE say?

Question 3

We have proper indexes in both the tables

In my experience I have never seen MySQL using more than one index per table , so you need a multicolumn index in this case.

Add the following indexes, try again the query and see the execution plan.

user_orders

alter table user_orders 
 add index pid_tp_ortridt_st (portfolio_id,type,order_trigger_date,state) ;

user_portfolios

alter table user_portfolios 
 add index id_st (id,state) ;

See Multiple-Column Indexes

Question 4

It's a nasty. You are filtering on two tables, and the main table needs a two-dimensional index (which does not exist).

I suggest these as possibly helping:

user_orders: INDEX(type, state, order_trigger_date, portfolio_id)
user_orders: INDEX(type, order_trigger_date, portfolio_id, state)
user_portfolios: INDEX(state, id)

You could try this reformulation, too:

SELECT uo.portfolio_id, min(uo.order_trigger_date) as order_trigger_date
 FROM `user_orders` AS uo `user_portfolios` AS up ON up.`id` = uo.`portfolio_id`
 WHERE uo.`state` IN ( 'executed_success', 
 'executed_failure',
 'placed_failure' 
 )
 AND uo.`type` = 'buy'
 AND uo.`order_trigger_date` BETWEEN '1969-12-31 18:30:00' AND '2023-12-15 18:29:59'
 AND up.`state` = 'active'
 AND EXISTS ( SELECT 1
 FROM `user_portfolios` AS up
 WHERE up.`id` = uo.`portfolio_id` 
 )
 GROUP BY uo.`portfolio_id`

The following won't change speed, but makes more sense:

 AND `order_trigger_date` BETWEEN '1969-12-31 18:30:00'
 AND '2023-12-15 18:29:59'

-->

 AND `order_trigger_date` < '2023-12-15 18:29:59'

That looks like it is "all dates", in which case remove that part of the WHERE clause to make any SELECT variant run faster. If date range comes from user input, build the WHERE clause on the fly to avoid kludges like that BETWEEN.

Ergest Basha Ergest Basha 5,3693 gold badges8 silver badges22 bronze badges · Answer 1 · 2024-01-27 16:36:36Z

We have proper indexes in both the tables

In my experience I have never seen MySQL using more than one index per table , so you need a multicolumn index in this case.

Add the following indexes, try again the query and see the execution plan.

user_orders

alter table user_orders 
 add index pid_tp_ortridt_st (portfolio_id,type,order_trigger_date,state) ;

user_portfolios

alter table user_portfolios 
 add index id_st (id,state) ;

See Multiple-Column Indexes

Rick James Rick James 80.7k5 gold badges52 silver badges119 bronze badges · Answer 2 · 2024-01-28 01:33:56Z

It's a nasty. You are filtering on two tables, and the main table needs a two-dimensional index (which does not exist).

I suggest these as possibly helping:

user_orders: INDEX(type, state, order_trigger_date, portfolio_id)
user_orders: INDEX(type, order_trigger_date, portfolio_id, state)
user_portfolios: INDEX(state, id)

You could try this reformulation, too:

SELECT uo.portfolio_id, min(uo.order_trigger_date) as order_trigger_date
 FROM `user_orders` AS uo `user_portfolios` AS up ON up.`id` = uo.`portfolio_id`
 WHERE uo.`state` IN ( 'executed_success', 
 'executed_failure',
 'placed_failure' 
 )
 AND uo.`type` = 'buy'
 AND uo.`order_trigger_date` BETWEEN '1969-12-31 18:30:00' AND '2023-12-15 18:29:59'
 AND up.`state` = 'active'
 AND EXISTS ( SELECT 1
 FROM `user_portfolios` AS up
 WHERE up.`id` = uo.`portfolio_id` 
 )
 GROUP BY uo.`portfolio_id`

The following won't change speed, but makes more sense:

 AND `order_trigger_date` BETWEEN '1969-12-31 18:30:00'
 AND '2023-12-15 18:29:59'

-->

 AND `order_trigger_date` < '2023-12-15 18:29:59'

That looks like it is "all dates", in which case remove that part of the WHERE clause to make any SELECT variant run faster. If date range comes from user input, build the WHERE clause on the fly to avoid kludges like that BETWEEN.

Stack Exchange Network

MySQL optimize query against table containing 13M records

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

MySQL optimize query against table containing 13M records

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions