I'm hoping to get some friendly advice how to proceed with optimizing this server.
VM:
8vCPU - 2.6GHz
32GB mem
Disk speed
~3Gbps (380MBps)
Application is Librenms
my.cnf.d/server.cnf -- Built from template somewhere on github
The server currently idles at around 500% cpu with Mariadb being the culprit
-
1I've added some information that I find hilarious.... /etc/my.cnf doesn't have any includes for the /etc/my.cnf.d directory, thus none of the changes I've made were ever changed. I've copied the file /etc/my.cnf.d/server.cnf to /etc/my.cnf and I'm having a whole new world of problems. I'm assuming I need to stop the service and clean out innodb. I should have more information available in an hour. This was brought to my attention when I realized I couldn't change any of the logging variables.choseph– choseph2020年12月07日 23:12:10 +00:00Commented Dec 7, 2020 at 23:12
-
It varies slightly on RPM distros /etc/my.cnf is read, on Deb based /etc/mysql/my.cnf is read. I'm not sure what you mean by 'clean out innodb', sounds scary. These files are only read on startup.danblack– danblack2020年12月08日 04:12:49 +00:00Commented Dec 8, 2020 at 4:12
-
1Here is the config I'm working with: pastebin.com/eLwGPXmdchoseph– choseph2020年12月08日 15:18:06 +00:00Commented Dec 8, 2020 at 15:18
-
1"Clean out innodb" was referring to the comments at the top of my first server.cnf file that said I would need to purge the logs. That wasn't the issue. Once I got mysql logging, I had to run through the hoops of deprecated options/variables and file permission errors for socket and pid. I soaked the new config for about 15 minutes before I abandoned it and went back to stock. I added a few variables to the config for innodb pool and log buffer and file size as well as max connections. CPU still high, but more memory is being used.choseph– choseph2020年12月08日 15:25:01 +00:00Commented Dec 8, 2020 at 15:25
-
Comments are important in the config files; the processing time for them is insignificant.Rick James– Rick James2020年12月08日 19:32:26 +00:00Commented Dec 8, 2020 at 19:32
2 Answers 2
Increase innodb_buffer_pool_size
to be about 70% of available RAM.
High CPU implies poor indexing and/or poor query formulation. Let's see some queries and the matching SHOW CREATE TABLE
.
INDEX(`rule_id`,`device_id`,`state`) -- handles the following:
INDEX(`rule_id`,`device_id`) -- DROP
INDEX(`rule_id`) -- DROP
Add
INDEX(state, time_logged)
Shrink INTs
to some smaller datatype. Saving space speeds up the query a little.
Invert the query -- Find the 50 rows, then do the JOINs
:
SELECT ...
FROM (
SELECT id FROM log
WHERE ...
ORDER BY time_logged DESC
LIMIT 0,50
)
JOIN log AS e USING(id) -- to get other columns from log
JOIN the other tables for the other columns
ORDER BY time_logged DESC -- repeat if ORDER matters
Use pt-query-digest
to summarize the slowlog; it will be easier to spot the "worst" queries.
Other issues...
max_connections = 200000
is ridiculously high, even dangerously high. That is the number of "connections" to allow at any one time. RAM is consumed for each connection. DoSHOW GLOBAL STATUS LIKE 'Max_used_connections';
to see that you have not come anywhere near that value.100
is a safe starting value until you have a reason to increase it.- Notice (in slowlog)
Rows_examined: 973090
. How many rows in each of the tables? The changes I suggest will probably bring that number down, though probably not down to the optimal of 50 (as inRows_sent: 50
). - An owl can turn its head almost all the way around. That is how my head feels when I see a
LEFT JOIN
andRIGHT JOIN
in the same statement. My head would appreciate it if you could turn thatRIGHT JOIN
into a suitableLEFT JOIN
or, if appropriate,INNER JOIN
. In particular, in the queries withAND R.severity >= 2
, it really is anINNER JOIN
.
-
1Sorry for the delay. I'm just now getting around to slow queries. All the slow queries seem to be originating from the web front end, which people are consistently logged into all day long. I think the widgets are performing these queries... Here are the whoppers: pastebin.com/QgxS836u I set the slow time to 1. There are some 10 seconders in there that I'm assuming are causing occasional spike. Here is the SHOW CREATE TABLE info for this alert_log table: pastebin.com/YHBgEe5w I also set log_queries_not_using_indexes=1 and the slow log reached 100MB in about 2 mins.choseph– choseph2020年12月08日 19:13:37 +00:00Commented Dec 8, 2020 at 19:13
-
1Thanks for the quick response! As far as the queries go, I will probably need to suggest these to the librenms team as this isn't my application. The max connections after running for less than one hour is 366. I will lower to 1000 for starters. I definitely overdid that one...choseph– choseph2020年12月08日 19:39:14 +00:00Commented Dec 8, 2020 at 19:39
-
@choseph -
log_queries_not_using_indexes=1
leads to a lot of useless noise, especially after the tables have been filling up for some time. Leave it off. Come back periodically to run the slowlog (without that setting); that way you will eventually become aware of such a missing index.Rick James– Rick James2020年12月08日 19:39:28 +00:00Commented Dec 8, 2020 at 19:39 -
1@choseph - Yes, 1000 should be good. If you do exceed that, check that the clients are disconnecting; check for the web server allowing lots of "children" unnecessarily, etc.Rick James– Rick James2020年12月08日 19:41:15 +00:00Commented Dec 8, 2020 at 19:41
-
1We performed the inverted query and it runs 2000 times faster, but there is logic to why they're doing it that way. There's information in the alert rules table that can be ordered by, so the pagination has to exist on the outer layer. I appreciate your help!choseph– choseph2020年12月08日 22:46:47 +00:00Commented Dec 8, 2020 at 22:46
Your mysqltuner.pl report is from 11 seconds of uptime - this is too short a sample to reach significant conclusions on its output. Repost after 24 hrs of uptime. Include "SHOW GLOBAL STATUS" too.
Disable query cache. It has high contention and slows most queries down
As Rick said, focus on slow queries by looking through the slow query log. pt-query-digest
With log_queries_not_using_indexes=1 you will get significant amount low value log items. Set min_examined_row_limit=1000 to minimize the noise.
long_query_time might be more appropriate to be 1 (second) or less.
log-slow-verbosity=query_plan,explain will include explain info in the slow log. See MySQL: Building the best INDEX for a given SELECT for creating the right indexes for a query.
Feel free to ask further questions on tuning queries/indexes with the query, EXPLAIN {query}
and SHOW CREATE TABLE {tablename}
output.
Ps: other mysqltuner output - ignore recommendations about mariadb.sys, its a locked account.
-
1I've updated the SQL tuner output: pastebin.com/CCgGf8zM Thanks!choseph– choseph2020年12月07日 21:41:59 +00:00Commented Dec 7, 2020 at 21:41
-
1pastebin.com/Hx7kzUJf . I've shored up the query cache warnings. The output looks healthier now. I also enabled performance schema and increased my innodb log file size (6G) to 25% of my pool size (24G). Only general recommendations! mariadb process is still floating between 240-650%. I'm assuming that is going to be best troubleshot through slow query identification?choseph– choseph2020年12月08日 19:09:47 +00:00Commented Dec 8, 2020 at 19:09
-
1My current my.cnf file FYI: pastebin.com/5uATfpfMchoseph– choseph2020年12月08日 19:10:53 +00:00Commented Dec 8, 2020 at 19:10
-
1Virtually every "production" server should have both
query_cache_type = OFF
andquery_cache_size = 0
. This will (usually) improve performance.Rick James– Rick James2020年12月08日 19:45:07 +00:00Commented Dec 8, 2020 at 19:45 -
1@choseph - To get a feel for whether the log_file_size is appropriate, I need to see several metrics. mysql.rjweb.org/doc.php/mysql_analysis#tuningRick James– Rick James2020年12月08日 19:46:51 +00:00Commented Dec 8, 2020 at 19:46