Performance problem selecting large log table with one join

Question 1

I've a Log-Tabelle with approximately 13 million rows. The table structure is (truncated) as follows:

 CREATE TABLE `dmsserver_status_log` (
 `id` bigint(20) NOT NULL,
 `customer_id` int(11) NOT NULL,
 `timestamp` datetime(6) NOT NULL,
 `level` varchar(10) NOT NULL,
 `logger_name` varchar(255) NOT NULL,
 `message` text NOT NULL,
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

id is the PK and AUTO_INCREMENT
customer_id and timestamp have a KEY each
customer_id has a FOREIGN KEY to the 'customer' table

The customer table structure is (truncated) as follows and have approximately 1700 rows:

 CREATE TABLE `customer` (
 `id` int(11) NOT NULL,
 `number` varchar(10) NOT NULL,
 [...]
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

id is the PK and AUTO_INCREMENT
number is UNIQUE KEY

There a many more columns and some more keys on these columns but I can't imagine that this is related to my performance problem (?)

If I query my log table (without a join) I receive the result in a short time:

 SELECT
 `timestamp`,
 `level`,
 logger_name,
 message
 FROM dmsserver_status_log as log
 WHERE
 `timestamp` > '2020-03-01 00:00:00'
 AND message LIKE '%fehlgeschlagen%'
 ORDER BY
 `timestamp` ASC;

In this example I get 57 hits in about a second. This is okay for me. If I expand my select with a JOIN on the customer table the same select needs round about 30 seconds or more to complete:

 SELECT
 `timestamp`,
 `level`,
 logger_name,
 message,
 customer.`number`
 FROM dmsserver_status_log as log
 JOIN customer ON customer.id = log.customer_id
 WHERE
 `timestamp` > '2020-03-01 00:00:00'
 AND message LIKE '%fehlgeschlagen%'
 ORDER BY
 `timestamp` ASC;

Depending on my query and the range of the timestamp this query could take several or dozen of minutes.. I just dont understand why this select take so much time to complete.

Here is the 'EXPLAIN EXTENDED' output of obove select with join:

 id select_type `table` `type` possible_keys `key` key_len `ref` `rows` filtered Extra
 1 SIMPLE customer index PRIMARY number 32 1749 100 Using index; Using temporary; Using filesort
 1 SIMPLE log ref customer_id,timestamp customer_id 4 customer.id 515 100 Using where

Could someone explain to me why it takes so much more time to join the second table?

Edit:

As requested here are the Results of SHOW CREATE TABLE dmsserver_status_log

 CREATE TABLE `dmsserver_status_log` (
 `id` bigint(20) NOT NULL AUTO_INCREMENT,
 `customer_id` int(11) NOT NULL,
 `timestamp` datetime(6) NOT NULL,
 `thread_name` varchar(255) NOT NULL,
 `level` varchar(10) NOT NULL,
 `logger_name` varchar(255) NOT NULL,
 `source_host` varchar(50) NOT NULL,
 `message` text NOT NULL,
 `stacktrace` text,
 `mdc` text,
 PRIMARY KEY (`id`),
 KEY `customer_id` (`customer_id`),
 KEY `timestamp` (`timestamp`),
 CONSTRAINT `dmsserver_status_log_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`) ON UPDATE CASCADE
 ) ENGINE=InnoDB AUTO_INCREMENT=17996215 DEFAULT CHARSET=utf8

and here are the Results of SHOW CREATE TABLE customer

 CREATE TABLE `customer` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `number` varchar(10) NOT NULL,
 `headquarter_customer_number` varchar(10) DEFAULT NULL COMMENT 'Angabe des Hauptbetriebs wenn dies eine Filiale ist',
 `it_location_customer_number` varchar(10) DEFAULT NULL COMMENT 'Angabe des EDV-Standorts wenn dieser Kunde keine eigene EDV hat',
 `customer_type_id` int(11) NOT NULL DEFAULT '1',
 `address_id` int(11) NOT NULL,
 `characteristic_id` int(11) NOT NULL DEFAULT '1',
 `iban` varchar(34) DEFAULT NULL,
 `bic` varchar(11) DEFAULT NULL,
 `bank` varchar(100) DEFAULT NULL,
 `newsletter` tinyint(1) NOT NULL DEFAULT '1',
 `newsletter_email` varchar(100) DEFAULT NULL,
 `invoice_email` varchar(100) DEFAULT NULL COMMENT 'Email-Adresse für den Empfang von Rechnungen als PDF',
 `api_password` varchar(255) DEFAULT NULL COMMENT 'Zugangspasswort für den Zugriff auf unsere API',
 `comments` varchar(250) DEFAULT NULL,
 `valid_id` smallint(6) NOT NULL,
 `create_time` datetime NOT NULL,
 `create_by` int(11) NOT NULL,
 `change_time` datetime NOT NULL,
 `change_by` int(11) NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `number` (`number`),
 KEY `change_by` (`change_by`),
 KEY `characteristic_id` (`characteristic_id`),
 KEY `customer_type_id` (`customer_type_id`),
 KEY `address_id` (`address_id`),
 KEY `valid_id` (`valid_id`),
 KEY `create_by` (`create_by`),
 KEY `headquarter_customer_number` (`headquarter_customer_number`),
 KEY `it_location_customer_number` (`it_location_customer_number`),
 CONSTRAINT `customer_ibfk_1` FOREIGN KEY (`customer_type_id`) REFERENCES `customer_type` (`id`),
 CONSTRAINT `customer_ibfk_2` FOREIGN KEY (`address_id`) REFERENCES `address` (`id`),
 CONSTRAINT `customer_ibfk_3` FOREIGN KEY (`characteristic_id`) REFERENCES `customer_characteristic` (`id`),
 CONSTRAINT `customer_ibfk_4` FOREIGN KEY (`create_by`) REFERENCES `users` (`id`) ON UPDATE CASCADE,
 CONSTRAINT `customer_ibfk_5` FOREIGN KEY (`change_by`) REFERENCES `users` (`id`) ON UPDATE CASCADE,
 CONSTRAINT `customer_ibfk_6` FOREIGN KEY (`headquarter_customer_number`) REFERENCES `customer` (`number`) ON UPDATE CASCADE,
 CONSTRAINT `customer_ibfk_7` FOREIGN KEY (`it_location_customer_number`) REFERENCES `customer` (`number`) ON UPDATE CASCADE
 ) ENGINE=InnoDB AUTO_INCREMENT=2540 DEFAULT CHARSET=utf8

Question 2

How many rows does customer table have? Also, have you executed analyze on both tables?

Question 3

Please post TEXT results of A) SHOW CREATE TABLE dmsserver_status_log; and B) SHOW CREATE TABLE customer; for analysis.

Question 4

I've just added requested outputs to my original message. The customer table have approx. 1700 rows

Question 5

Thank you for the Show Create Tables.

Question 6

Please provide EXPLAIN SELECT ... for each variant.

What version of MariaDB? (Optimizations have changed in this area.)

Also try these variations

This one isolates the filtering into a derived table, thereby minimizing the need for reaching into the other table.

SELECT log.`timestamp`, log.`level`, log.logger_name, log.message,
 c.`number`
 FROM (
 SELECT log1.customer_id, log1.`timestamp`,
 log1.`level`, log1.logger_name,
 log1.message
 FROM dmsserver_status_log AS log1 
 WHERE `timestamp` > '2020-03-01 00:00:00'
 AND message LIKE '%fehlgeschlagen%'
 ) AS log
 JOIN customer AS c ON c.id = log.customer_id
 ORDER BY log.`timestamp` ASC;

This one defers looking at the second table in a different way.

SELECT `timestamp`, `level`, logger_name, message,
 ( SELECT `number` FROM customer
 WHERE customer.id = log.customer_id ) AS "number"
 FROM dmsserver_status_log as log
 JOIN customer ON customer.id = log.customer_id
 WHERE `timestamp` > '2020-03-01 00:00:00'
 AND message LIKE '%fehlgeschlagen%'
 ORDER BY `timestamp` ASC;

Depending on the version of MySQL/MariaDB, the three variations may be optimized differently.

If you are searching for a "word", consider using FULLTEXT(message) and MATCH(message) AGAINST ('+fehlgeschlagen' IN BOOLEAN MODE). It should run a lot faster.

Question 7

You have individual indices (a.k.a. keys) on customer_id and timestamp in the dmsserver_status_log table. MariaDB can usually only use one index for any given query. So when you change the query to include a join, then MariaDB has to choose one of the indices to use.

What you need instead is a suitable compound index on your dmsserver_status_log table that can support relevant conditions in your WHERE clause, the JOIN and ORDER BY columns at the same time.

So the index you need would probably be CREATE INDEX dmsserver_status_log_idx1 ON (customer_id, timestamp). Note that the order of the columns is important. (If the EXPLAIN shows that the optimizer will not use this index, then you can try to re-create it with the columns in the reverse order. Query optimisation can be difficult ...)

Note that the message column which also appears in your WHERE clause will not benefit from being in the compound index with the particular query as it stands because you have a wildcard before the search string.

You may want to consider removing the existing individual indices on customer_id and timestamp if you only care about good performance on the particular JOIN query. Maintaining a lot of indexes in a table means slower INSERTs and they also take up some memory.

For reference, here's an article about building indices for MariaDB/MySQL: Building the best INDEX for a given SELECT

Question 8

Martin, The EXPLAIN indicates customer was the first table used, you likely know that you want the LOG to be selected first to get only rows> 2020年03月01日. You should be able to make that happen with SELECT STRAIGHT_JOIN ......
and change

 JOIN customer ON customer.id = log.customer_id

to

 JOIN customer ON log.customer_id = customer.id

Please post new time required with this STRAIGHT_JOIN version.

Question 9

I doubt seriously whether x=y is optimized any differently than y=x.

Question 10

The major problem with STRAIGHT_JOIN is that if later a level is added to the customer, that's all of a sudden a join criteria. It should never be used except in the very extreme cases like identical tables.

Question 11

@martin-fernau, Please post your current query and EXPLAIN EXTENDED test results so we can see what the optimizer did with your query. Thanks

Rick James Rick James 80.7k5 gold badges52 silver badges119 bronze badges · Answer 1 · 2020-04-01 05:15:22Z

Please provide EXPLAIN SELECT ... for each variant.

What version of MariaDB? (Optimizations have changed in this area.)

Also try these variations

This one isolates the filtering into a derived table, thereby minimizing the need for reaching into the other table.

SELECT log.`timestamp`, log.`level`, log.logger_name, log.message,
 c.`number`
 FROM (
 SELECT log1.customer_id, log1.`timestamp`,
 log1.`level`, log1.logger_name,
 log1.message
 FROM dmsserver_status_log AS log1 
 WHERE `timestamp` > '2020-03-01 00:00:00'
 AND message LIKE '%fehlgeschlagen%'
 ) AS log
 JOIN customer AS c ON c.id = log.customer_id
 ORDER BY log.`timestamp` ASC;

This one defers looking at the second table in a different way.

SELECT `timestamp`, `level`, logger_name, message,
 ( SELECT `number` FROM customer
 WHERE customer.id = log.customer_id ) AS "number"
 FROM dmsserver_status_log as log
 JOIN customer ON customer.id = log.customer_id
 WHERE `timestamp` > '2020-03-01 00:00:00'
 AND message LIKE '%fehlgeschlagen%'
 ORDER BY `timestamp` ASC;

Depending on the version of MySQL/MariaDB, the three variations may be optimized differently.

If you are searching for a "word", consider using FULLTEXT(message) and MATCH(message) AGAINST ('+fehlgeschlagen' IN BOOLEAN MODE). It should run a lot faster.

dbdemon dbdemon 7,0394 gold badges21 silver badges40 bronze badges · Answer 2 · 2020-03-18 22:07:32Z

You have individual indices (a.k.a. keys) on customer_id and timestamp in the dmsserver_status_log table. MariaDB can usually only use one index for any given query. So when you change the query to include a join, then MariaDB has to choose one of the indices to use.

What you need instead is a suitable compound index on your dmsserver_status_log table that can support relevant conditions in your WHERE clause, the JOIN and ORDER BY columns at the same time.

So the index you need would probably be CREATE INDEX dmsserver_status_log_idx1 ON (customer_id, timestamp). Note that the order of the columns is important. (If the EXPLAIN shows that the optimizer will not use this index, then you can try to re-create it with the columns in the reverse order. Query optimisation can be difficult ...)

Note that the message column which also appears in your WHERE clause will not benefit from being in the compound index with the particular query as it stands because you have a wildcard before the search string.

You may want to consider removing the existing individual indices on customer_id and timestamp if you only care about good performance on the particular JOIN query. Maintaining a lot of indexes in a table means slower INSERTs and they also take up some memory.

For reference, here's an article about building indices for MariaDB/MySQL: Building the best INDEX for a given SELECT

Wilson Hauck Wilson Hauck 1,7631 gold badge11 silver badges13 bronze badges · Answer 3 · 2020-03-21 14:06:52Z

-1

Martin, The EXPLAIN indicates customer was the first table used, you likely know that you want the LOG to be selected first to get only rows> 2020年03月01日. You should be able to make that happen with SELECT STRAIGHT_JOIN ......
and change

 JOIN customer ON customer.id = log.customer_id

to

 JOIN customer ON log.customer_id = customer.id

Please post new time required with this STRAIGHT_JOIN version.

Share

Improve this answer

edited Mar 21, 2020 at 14:20

answered Mar 21, 2020 at 14:06

Wilson Hauck's user avatar

Wilson Hauck Wilson Hauck

1,7631 gold badge11 silver badges13 bronze badges

3

1

I doubt seriously whether x=y is optimized any differently than y=x.

Rick James
– Rick James

2020年04月01日 04:39:20 +00:00
Commented Apr 1, 2020 at 4:39
The major problem with STRAIGHT_JOIN is that if later a level is added to the customer, that's all of a sudden a join criteria. It should never be used except in the very extreme cases like identical tables.

danblack
– danblack

2020年04月01日 05:22:53 +00:00
Commented Apr 1, 2020 at 5:22
@martin-fernau, Please post your current query and EXPLAIN EXTENDED test results so we can see what the optimizer did with your query. Thanks

Wilson Hauck
– Wilson Hauck

2020年04月01日 11:22:56 +00:00
Commented Apr 1, 2020 at 11:22

Add a comment |

Stack Exchange Network

Performance problem selecting large log table with one join

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Performance problem selecting large log table with one join

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions