We recently migrated our application to an entirely new infrastructure, going from a single server managing everything to a kubernetes cluster and dedicated database server running on AWS RDS. On Thursday, we had a planned spike in traffic due to marketing campaigns being sent, and during the middle of this peak we experienced a huge slow down in response times. Upon initial investigations, we discovered it was because the database was running at 100% CPU usage, and all queries were taking a long time to return (expected given the CPU usage)
We've never seen this behaviour before, so we're trying to determine if we've simply hit the max capacity of the server for that workload, or if we can improve this. During our research, we came across a few posts discussing the adaptive_hash_index
option. We came across this because when we analysed the data from RDS, we saw a lot of btr_search
waits.
The posts we saw suggested that in some applications, it may be beneficial to disable the adaptive_hash_index
option. What we need to know though, is would this be beneficial given what we're seeing here, and more importantly how can we test that it has actually done anything and made a positive difference?
We don't usually have the level of traffic we saw, and under our usual workload we have absolutely no performance issue that we're aware of. The queries that were taking a long time to respond at peak, don't usually take more than < 50-100ms to return.
Could anyone shed some light on what might be another potential issue if not the hash index, and how we may go about testing it if we disable? I can provide further metrics as requested.
Slow queries
As requested, I've got a couple of examples of some slow queries. These are two of the more common queries that appear during our slow period
SELECT
table_a.id
, table_a.name
FROM `table_a`
INNER JOIN `table_b` ON `table_a`.`id` = `table_b`.`product_id`
INNER JOIN `table_c` ON `table_b`.`brochure_id` = `table_c`.`id`
WHERE `table_c`.`id` = 215
AND `table_a`.`enabled` = TRUE
AND `table_a`.`table_d_id` = 20
AND `table_a`.`ref` IS NULL
ORDER BY table_a.name;
This query took just under 2 minutes to return 10,541 rows. Running this query now when the system isn't under load, it takes ~250ms. Below is the explain plan
+-------------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+---------+------------------------+-------+----------+----------------------------------------------+
| select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | extra |
+=============+=========+============+========+============================================================================================================================================================+============================================+=========+========================+=======+==========+==============================================+
| SIMPLE | table_c | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index; Using temporary; Using filesort |
+-------------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+---------+------------------------+-------+----------+----------------------------------------------+
| SIMPLE | table_b | NULL | ref | index_table_b_on_table_c_id_and_table_a_id,index_table_b_on_table_a_id | index_table_b_on_table_c_id_and_table_a_id | 5 | const | 23082 | 100.00 | Using where; Using index |
+-------------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+---------+------------------------+-------+----------+----------------------------------------------+
| SIMPLE | table_a | NULL | eq_ref | PRIMARY,index_table_a_on_ref,index_table_a_on_table_d_id_and_enabled,index_table_a_on_table_d_id,index_table_a_on_enabled,index_table_a_on_enabled_and_ref | PRIMARY | 4 | app.table_b.table_a_id | 1 | 5.00 | Using where |
+-------------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+---------+------------------------+-------+----------+----------------------------------------------+
To my eyes, it looks like it's correctly using indices?
SELECT
`table_a`.`id`
, `table_a`.`table_e_id`
, `table_a`.`table_b_id`
, `table_a`.`col_1`
, `table_a`.`col_2`
, `table_a`.`col_3`
, `table_a`.`price`
, `table_a`.`rrp`
, `table_a`.`col_4`
, `table_a`.`col_5`
, `table_a`.`col_6`
, `table_a`.`col_7`
, `table_a`.`col_8`
, `table_a`.`col_9`
, `table_a`.`col_10`
, table_d.id AS _table_d_id
FROM `table_a`
INNER JOIN table_b ON table_a.table_b_id = table_b.id
INNER JOIN table_c ON table_b.id = table_c.table_b_id
INNER JOIN table_d ON table_d.id = table_c.table_d_id
INNER JOIN table_e ON table_b.table_e_id = table_e.id
WHERE `table_a`.`enabled` = TRUE
AND `table_a`.`published` = TRUE
AND (table_f_id IN (1, 2, 3, 4, 5, 6, 7))
AND (table_b.enabled = TRUE)
AND (table_e.enabled = TRUE)
AND (
(table_d.id = 1 AND price <= 828.09)
OR (table_d.id = 2 AND price <= 1661.17)
OR (table_d.id = 3 AND price <= 2494.26)
)
AND (1 = 1)
AND (1 = 1)
ORDER BY rrp desc
LIMIT 48 OFFSET 0;
This query took around 11 seconds to return 48 rows. Running this query now when the system isn't under load, it takes ~100ms. Below is the explain plan
+----+-------------+---------+------------+--------+---------------------------------------------------------------------------------------------------------+-----------------------------+---------+------------------------------------+------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+====+=============+=========+============+========+=========================================================================================================+=============================+=========+====================================+======+==========+=====================================================================+
| 1 | | table_c | | range | index_table_c_on_table_d_id,index_table_c_on_table_b_id | index_table_c_on_table_d_id | 5 | | 412 | 100.00 | Using index condition; Using where; Using temporary; Using filesort |
+----+-------------+---------+------------+--------+---------------------------------------------------------------------------------------------------------+-----------------------------+---------+------------------------------------+------+----------+---------------------------------------------------------------------+
| 1 | SIMPLE | table_d | | eq_ref | PRIMARY | PRIMARY | 4 | app.table_c.table_d_id | 1 | 100.00 | Using index |
+----+-------------+---------+------------+--------+---------------------------------------------------------------------------------------------------------+-----------------------------+---------+------------------------------------+------+----------+---------------------------------------------------------------------+
| 1 | SIMPLE | table_b | | eq_ref | PRIMARY,index_table_b_on_table_e_id_and_enabled | PRIMARY | 4 | app.table_c.table_b_id | 1 | 10.00 | Using where |
+----+-------------+---------+------------+--------+---------------------------------------------------------------------------------------------------------+-----------------------------+---------+------------------------------------+------+----------+---------------------------------------------------------------------+
| 1 | SIMPLE | table_e | | eq_ref | PRIMARY,index_table_e_on_enabled | PRIMARY | 4 | app.table_b.table_e_id | 1 | 88.24 | Using where |
+----+-------------+---------+------------+--------+---------------------------------------------------------------------------------------------------------+-----------------------------+---------+------------------------------------+------+----------+---------------------------------------------------------------------+
| 1 | SIMPLE | table_a | | ref | index_table_a_on_price,table_a_table_b_id_price,index_table_a_on_table_f_id,index_table_a_on_table_b_id | table_a_table_b_id_price | 9 | const,const,app.table_c.table_b_id | 46 | 27.99 | Using index condition; Using where |
+----+-------------+---------+------------+--------+---------------------------------------------------------------------------------------------------------+-----------------------------+---------+------------------------------------+------+----------+---------------------------------------------------------------------+
1 Answer 1
The first query might start with table_a or it might start with table_c. Here are the indexes to facilitate both query plans, thereby letting the Optimizer pick which seems better based on the data:
table_a: -- I assume you have `PRIMARY KEY(id)`?
table_a: INDEX(enabled, table_d_id, ref, -- in any order, then
name) -- last
table_b: INDEX(product_id, brochure_id) -- see link below
table_b: INDEX(brochure_id, product_id)
table_c: -- I assume you have `PRIMARY KEY(id)`?
Since the only references to table_c are
INNER JOIN `table_c` ON `table_b`.`brochure_id` = `table_c`.`id`
WHERE `table_c`.`id` = 215
you may as well replace them with just
WHERE `table_b`.`brochure_id` = 215
Is table_b a many-to-many mapping table. It is common for them to be inefficiently indexed. See my guidelines: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
For the Second query, here are some things to get started with (in addition to the many:many advice, above):
a: INDEX(enabled, published, -- in either order, then
table_f_id) -- last
EXPLAINs
Here is a partial attempt at explaining the EXPLAINs
that you have:
Using temporary; Using filesort
- At one (or possibly more) step of the execution something needs to be sorted, usually because ofORDER BY
.GROUP BY x ORDER BY w
may lead to multiple sorts;EXPLAIN
does not indicate that, butEXPLAIN FORMAT=JSON
does. "Filesort" does not necessarily indicate that the disk is involved.Using index
- The index was a "covering index", meaning that all the columns used anywhere in the query were found in a singleINDEX
. This is usually beneficial for performance but is rarely a goal to start with.key_len = 4
- probably a 4-byteINT NOT NULL
(signed or unsigned);=5
probably meansINT NULL
. Consider making itNOT NULL
. (This is a minor issue.)- The
Rows
column - Multiply the numbers in that column together to get a very crude estimate of work to perform the query. This is sometimes useful in comparing two Explains. Rows=1
- May indicate a unique key, hence a very efficientJOIN
.Rows = big number
- The table may have a lousy index.JOINs
are usually done via "NLJ" (Nested Loop Join) - read one table, reaching into the next table one row at a time.- The first row - The
table
in the first row is likely to be the one with the bestWHEREs
. If not, perhaps you need a better index -- often a composite index.
-
Interesting read on the many:many mapping table thank you!
table_b
in the first query is one so will definitely give that a go. Do you have any good resources on how to fully interpret an explain plan by any chance?PaReeOhNos– PaReeOhNos2020年11月17日 19:55:54 +00:00Commented Nov 17, 2020 at 19:55 -
And also what are you thoughts on the adaptive hash index, or do you think that's more of a red herring here?PaReeOhNos– PaReeOhNos2020年11月17日 19:56:22 +00:00Commented Nov 17, 2020 at 19:56
-
@PaReeOhNos -
EXPLAIN
-- I wish I did, but I have not found a good one. I'll be happy to elaborate on specific examples. Note thatEXPLAIN FORMAT=JSON SELECT ...
gives more detail and sometimes more clarity. "Operator trace" is sometimes useful.Rick James– Rick James2020年11月17日 22:22:19 +00:00Commented Nov 17, 2020 at 22:22 -
@PaReeOhNos - Unfortunately,
EXPLAIN
says only what is going on in the current query plan; it fails to answer important questions like "what index would be best?". The rest of my link makes an attempt at answering that question.Rick James– Rick James2020年11月17日 22:25:21 +00:00Commented Nov 17, 2020 at 22:25 -
@PaReeOhNos - I spelled out some of the useful info in our
EXPLAINs
.Rick James– Rick James2020年11月17日 22:39:16 +00:00Commented Nov 17, 2020 at 22:39
like
condition in the where clause, something which I've read is bad with the adaptive hash index (another reason I've looked at that)innodb_buffer_pool_size
? RAM?innodb_buffer_pool_size
is set to 3/4 of that (so 12GB).innodb_adaptive_hash_index_parts
is set to engine default which I believe is 8. There isn't one forinnodb_adaptive_hash_index_partitions
though i thought theparts
one was partitions?