High CPU usage, low memory usage on Mariadb 10.5.8 -- SQL tuner complaining about memory, performance seems fine

Question 1

I'm hoping to get some friendly advice how to proceed with optimizing this server.

VM:
8vCPU - 2.6GHz
32GB mem

Disk speed
~3Gbps (380MBps)

Application is Librenms

my.cnf.d/server.cnf -- Built from template somewhere on github

mysqltuner.pl output

The server currently idles at around 500% cpu with Mariadb being the culprit

sar -q output

free -m output

Question 2

I've added some information that I find hilarious.... /etc/my.cnf doesn't have any includes for the /etc/my.cnf.d directory, thus none of the changes I've made were ever changed. I've copied the file /etc/my.cnf.d/server.cnf to /etc/my.cnf and I'm having a whole new world of problems. I'm assuming I need to stop the service and clean out innodb. I should have more information available in an hour. This was brought to my attention when I realized I couldn't change any of the logging variables.

Question 3

It varies slightly on RPM distros /etc/my.cnf is read, on Deb based /etc/mysql/my.cnf is read. I'm not sure what you mean by 'clean out innodb', sounds scary. These files are only read on startup.

Question 4

Here is the config I'm working with: pastebin.com/eLwGPXmd

Question 5

"Clean out innodb" was referring to the comments at the top of my first server.cnf file that said I would need to purge the logs. That wasn't the issue. Once I got mysql logging, I had to run through the hoops of deprecated options/variables and file permission errors for socket and pid. I soaked the new config for about 15 minutes before I abandoned it and went back to stock. I added a few variables to the config for innodb pool and log buffer and file size as well as max connections. CPU still high, but more memory is being used.

Question 6

Comments are important in the config files; the processing time for them is insignificant.

Question 7

Increase innodb_buffer_pool_size to be about 70% of available RAM.

High CPU implies poor indexing and/or poor query formulation. Let's see some queries and the matching SHOW CREATE TABLE.

 INDEX(`rule_id`,`device_id`,`state`) -- handles the following:
 INDEX(`rule_id`,`device_id`) -- DROP
 INDEX(`rule_id`) -- DROP

Add

 INDEX(state, time_logged)

Shrink INTs to some smaller datatype. Saving space speeds up the query a little.

Invert the query -- Find the 50 rows, then do the JOINs:

SELECT ...
 FROM (
 SELECT id FROM log
 WHERE ...
 ORDER BY time_logged DESC
 LIMIT 0,50
 )
 JOIN log AS e USING(id) -- to get other columns from log
 JOIN the other tables for the other columns
 ORDER BY time_logged DESC -- repeat if ORDER matters

Use pt-query-digest to summarize the slowlog; it will be easier to spot the "worst" queries.

Other issues...

max_connections = 200000 is ridiculously high, even dangerously high. That is the number of "connections" to allow at any one time. RAM is consumed for each connection. Do SHOW GLOBAL STATUS LIKE 'Max_used_connections'; to see that you have not come anywhere near that value. 100 is a safe starting value until you have a reason to increase it.
Notice (in slowlog) Rows_examined: 973090. How many rows in each of the tables? The changes I suggest will probably bring that number down, though probably not down to the optimal of 50 (as in Rows_sent: 50).
An owl can turn its head almost all the way around. That is how my head feels when I see a LEFT JOIN and RIGHT JOIN in the same statement. My head would appreciate it if you could turn that RIGHT JOIN into a suitable LEFT JOIN or, if appropriate, INNER JOIN. In particular, in the queries with AND R.severity >= 2, it really is an INNER JOIN.

Question 8

Sorry for the delay. I'm just now getting around to slow queries. All the slow queries seem to be originating from the web front end, which people are consistently logged into all day long. I think the widgets are performing these queries... Here are the whoppers: pastebin.com/QgxS836u I set the slow time to 1. There are some 10 seconders in there that I'm assuming are causing occasional spike. Here is the SHOW CREATE TABLE info for this alert_log table: pastebin.com/YHBgEe5w I also set log_queries_not_using_indexes=1 and the slow log reached 100MB in about 2 mins.

Question 9

Thanks for the quick response! As far as the queries go, I will probably need to suggest these to the librenms team as this isn't my application. The max connections after running for less than one hour is 366. I will lower to 1000 for starters. I definitely overdid that one...

Question 10

@choseph - log_queries_not_using_indexes=1 leads to a lot of useless noise, especially after the tables have been filling up for some time. Leave it off. Come back periodically to run the slowlog (without that setting); that way you will eventually become aware of such a missing index.

Question 11

@choseph - Yes, 1000 should be good. If you do exceed that, check that the clients are disconnecting; check for the web server allowing lots of "children" unnecessarily, etc.

Question 12

We performed the inverted query and it runs 2000 times faster, but there is logic to why they're doing it that way. There's information in the alert rules table that can be ordered by, so the pagination has to exist on the outer layer. I appreciate your help!

Question 13

Your mysqltuner.pl report is from 11 seconds of uptime - this is too short a sample to reach significant conclusions on its output. Repost after 24 hrs of uptime. Include "SHOW GLOBAL STATUS" too.

Disable query cache. It has high contention and slows most queries down

As Rick said, focus on slow queries by looking through the slow query log. pt-query-digest

With log_queries_not_using_indexes=1 you will get significant amount low value log items. Set min_examined_row_limit=1000 to minimize the noise.

long_query_time might be more appropriate to be 1 (second) or less.

log-slow-verbosity=query_plan,explain will include explain info in the slow log. See MySQL: Building the best INDEX for a given SELECT for creating the right indexes for a query.

Feel free to ask further questions on tuning queries/indexes with the query, EXPLAIN {query} and SHOW CREATE TABLE {tablename} output.

Ps: other mysqltuner output - ignore recommendations about mariadb.sys, its a locked account.

Question 14

I've updated the SQL tuner output: pastebin.com/CCgGf8zM Thanks!

Question 15

pastebin.com/Hx7kzUJf . I've shored up the query cache warnings. The output looks healthier now. I also enabled performance schema and increased my innodb log file size (6G) to 25% of my pool size (24G). Only general recommendations! mariadb process is still floating between 240-650%. I'm assuming that is going to be best troubleshot through slow query identification?

Question 16

My current my.cnf file FYI: pastebin.com/5uATfpfM

Question 17

Virtually every "production" server should have both query_cache_type = OFF and query_cache_size = 0. This will (usually) improve performance.

Question 18

@choseph - To get a feel for whether the log_file_size is appropriate, I need to see several metrics. mysql.rjweb.org/doc.php/mysql_analysis#tuning

Rick James Rick James 80.7k5 gold badges52 silver badges119 bronze badges · Answer 1 · 2020-12-05 01:24:03Z

Increase innodb_buffer_pool_size to be about 70% of available RAM.

High CPU implies poor indexing and/or poor query formulation. Let's see some queries and the matching SHOW CREATE TABLE.

 INDEX(`rule_id`,`device_id`,`state`) -- handles the following:
 INDEX(`rule_id`,`device_id`) -- DROP
 INDEX(`rule_id`) -- DROP

Add

 INDEX(state, time_logged)

Shrink INTs to some smaller datatype. Saving space speeds up the query a little.

Invert the query -- Find the 50 rows, then do the JOINs:

SELECT ...
 FROM (
 SELECT id FROM log
 WHERE ...
 ORDER BY time_logged DESC
 LIMIT 0,50
 )
 JOIN log AS e USING(id) -- to get other columns from log
 JOIN the other tables for the other columns
 ORDER BY time_logged DESC -- repeat if ORDER matters

Use pt-query-digest to summarize the slowlog; it will be easier to spot the "worst" queries.

Other issues...

max_connections = 200000 is ridiculously high, even dangerously high. That is the number of "connections" to allow at any one time. RAM is consumed for each connection. Do SHOW GLOBAL STATUS LIKE 'Max_used_connections'; to see that you have not come anywhere near that value. 100 is a safe starting value until you have a reason to increase it.
Notice (in slowlog) Rows_examined: 973090. How many rows in each of the tables? The changes I suggest will probably bring that number down, though probably not down to the optimal of 50 (as in Rows_sent: 50).
An owl can turn its head almost all the way around. That is how my head feels when I see a LEFT JOIN and RIGHT JOIN in the same statement. My head would appreciate it if you could turn that RIGHT JOIN into a suitable LEFT JOIN or, if appropriate, INNER JOIN. In particular, in the queries with AND R.severity >= 2, it really is an INNER JOIN.

Sorry for the delay. I'm just now getting around to slow queries. All the slow queries seem to be originating from the web front end, which people are consistently logged into all day long. I think the widgets are performing these queries... Here are the whoppers: pastebin.com/QgxS836u I set the slow time to 1. There are some 10 seconders in there that I'm assuming are causing occasional spike. Here is the SHOW CREATE TABLE info for this alert_log table: pastebin.com/YHBgEe5w I also set log_queries_not_using_indexes=1 and the slow log reached 100MB in about 2 mins.
Thanks for the quick response! As far as the queries go, I will probably need to suggest these to the librenms team as this isn't my application. The max connections after running for less than one hour is 366. I will lower to 1000 for starters. I definitely overdid that one...
@choseph - log_queries_not_using_indexes=1 leads to a lot of useless noise, especially after the tables have been filling up for some time. Leave it off. Come back periodically to run the slowlog (without that setting); that way you will eventually become aware of such a missing index.
@choseph - Yes, 1000 should be good. If you do exceed that, check that the clients are disconnecting; check for the web server allowing lots of "children" unnecessarily, etc.
We performed the inverted query and it runs 2000 times faster, but there is logic to why they're doing it that way. There's information in the alert rules table that can be ordered by, so the pagination has to exist on the outer layer. I appreciate your help!

danblack danblack 8,3032 gold badges13 silver badges28 bronze badges · Answer 2 · 2020-12-05 21:21:02Z

Your mysqltuner.pl report is from 11 seconds of uptime - this is too short a sample to reach significant conclusions on its output. Repost after 24 hrs of uptime. Include "SHOW GLOBAL STATUS" too.

Disable query cache. It has high contention and slows most queries down

As Rick said, focus on slow queries by looking through the slow query log. pt-query-digest

With log_queries_not_using_indexes=1 you will get significant amount low value log items. Set min_examined_row_limit=1000 to minimize the noise.

long_query_time might be more appropriate to be 1 (second) or less.

log-slow-verbosity=query_plan,explain will include explain info in the slow log. See MySQL: Building the best INDEX for a given SELECT for creating the right indexes for a query.

Feel free to ask further questions on tuning queries/indexes with the query, EXPLAIN {query} and SHOW CREATE TABLE {tablename} output.

Ps: other mysqltuner output - ignore recommendations about mariadb.sys, its a locked account.

I've updated the SQL tuner output: pastebin.com/CCgGf8zM Thanks!
pastebin.com/Hx7kzUJf . I've shored up the query cache warnings. The output looks healthier now. I also enabled performance schema and increased my innodb log file size (6G) to 25% of my pool size (24G). Only general recommendations! mariadb process is still floating between 240-650%. I'm assuming that is going to be best troubleshot through slow query identification?
Virtually every "production" server should have both query_cache_type = OFF and query_cache_size = 0. This will (usually) improve performance.
@choseph - To get a feel for whether the log_file_size is appropriate, I need to see several metrics. mysql.rjweb.org/doc.php/mysql_analysis#tuning

Stack Exchange Network

High CPU usage, low memory usage on Mariadb 10.5.8 -- SQL tuner complaining about memory, performance seems fine

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

High CPU usage, low memory usage on Mariadb 10.5.8 -- SQL tuner complaining about memory, performance seems fine

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions