2

I've a Log-Tabelle with approximately 13 million rows. The table structure is (truncated) as follows:

 CREATE TABLE `dmsserver_status_log` (
 `id` bigint(20) NOT NULL,
 `customer_id` int(11) NOT NULL,
 `timestamp` datetime(6) NOT NULL,
 `level` varchar(10) NOT NULL,
 `logger_name` varchar(255) NOT NULL,
 `message` text NOT NULL,
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  • id is the PK and AUTO_INCREMENT
  • customer_id and timestamp have a KEY each
  • customer_id has a FOREIGN KEY to the 'customer' table

The customer table structure is (truncated) as follows and have approximately 1700 rows:

 CREATE TABLE `customer` (
 `id` int(11) NOT NULL,
 `number` varchar(10) NOT NULL,
 [...]
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  • id is the PK and AUTO_INCREMENT
  • number is UNIQUE KEY

There a many more columns and some more keys on these columns but I can't imagine that this is related to my performance problem (?)

If I query my log table (without a join) I receive the result in a short time:

 SELECT
 `timestamp`,
 `level`,
 logger_name,
 message
 FROM dmsserver_status_log as log
 WHERE
 `timestamp` > '2020-03-01 00:00:00'
 AND message LIKE '%fehlgeschlagen%'
 ORDER BY
 `timestamp` ASC;

In this example I get 57 hits in about a second. This is okay for me. If I expand my select with a JOIN on the customer table the same select needs round about 30 seconds or more to complete:

 SELECT
 `timestamp`,
 `level`,
 logger_name,
 message,
 customer.`number`
 FROM dmsserver_status_log as log
 JOIN customer ON customer.id = log.customer_id
 WHERE
 `timestamp` > '2020-03-01 00:00:00'
 AND message LIKE '%fehlgeschlagen%'
 ORDER BY
 `timestamp` ASC;

Depending on my query and the range of the timestamp this query could take several or dozen of minutes.. I just dont understand why this select take so much time to complete.

Here is the 'EXPLAIN EXTENDED' output of obove select with join:

 id select_type `table` `type` possible_keys `key` key_len `ref` `rows` filtered Extra
 1 SIMPLE customer index PRIMARY number 32 1749 100 Using index; Using temporary; Using filesort
 1 SIMPLE log ref customer_id,timestamp customer_id 4 customer.id 515 100 Using where

Could someone explain to me why it takes so much more time to join the second table?

Edit:

As requested here are the Results of SHOW CREATE TABLE dmsserver_status_log

 CREATE TABLE `dmsserver_status_log` (
 `id` bigint(20) NOT NULL AUTO_INCREMENT,
 `customer_id` int(11) NOT NULL,
 `timestamp` datetime(6) NOT NULL,
 `thread_name` varchar(255) NOT NULL,
 `level` varchar(10) NOT NULL,
 `logger_name` varchar(255) NOT NULL,
 `source_host` varchar(50) NOT NULL,
 `message` text NOT NULL,
 `stacktrace` text,
 `mdc` text,
 PRIMARY KEY (`id`),
 KEY `customer_id` (`customer_id`),
 KEY `timestamp` (`timestamp`),
 CONSTRAINT `dmsserver_status_log_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`) ON UPDATE CASCADE
 ) ENGINE=InnoDB AUTO_INCREMENT=17996215 DEFAULT CHARSET=utf8

and here are the Results of SHOW CREATE TABLE customer

 CREATE TABLE `customer` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `number` varchar(10) NOT NULL,
 `headquarter_customer_number` varchar(10) DEFAULT NULL COMMENT 'Angabe des Hauptbetriebs wenn dies eine Filiale ist',
 `it_location_customer_number` varchar(10) DEFAULT NULL COMMENT 'Angabe des EDV-Standorts wenn dieser Kunde keine eigene EDV hat',
 `customer_type_id` int(11) NOT NULL DEFAULT '1',
 `address_id` int(11) NOT NULL,
 `characteristic_id` int(11) NOT NULL DEFAULT '1',
 `iban` varchar(34) DEFAULT NULL,
 `bic` varchar(11) DEFAULT NULL,
 `bank` varchar(100) DEFAULT NULL,
 `newsletter` tinyint(1) NOT NULL DEFAULT '1',
 `newsletter_email` varchar(100) DEFAULT NULL,
 `invoice_email` varchar(100) DEFAULT NULL COMMENT 'Email-Adresse für den Empfang von Rechnungen als PDF',
 `api_password` varchar(255) DEFAULT NULL COMMENT 'Zugangspasswort für den Zugriff auf unsere API',
 `comments` varchar(250) DEFAULT NULL,
 `valid_id` smallint(6) NOT NULL,
 `create_time` datetime NOT NULL,
 `create_by` int(11) NOT NULL,
 `change_time` datetime NOT NULL,
 `change_by` int(11) NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `number` (`number`),
 KEY `change_by` (`change_by`),
 KEY `characteristic_id` (`characteristic_id`),
 KEY `customer_type_id` (`customer_type_id`),
 KEY `address_id` (`address_id`),
 KEY `valid_id` (`valid_id`),
 KEY `create_by` (`create_by`),
 KEY `headquarter_customer_number` (`headquarter_customer_number`),
 KEY `it_location_customer_number` (`it_location_customer_number`),
 CONSTRAINT `customer_ibfk_1` FOREIGN KEY (`customer_type_id`) REFERENCES `customer_type` (`id`),
 CONSTRAINT `customer_ibfk_2` FOREIGN KEY (`address_id`) REFERENCES `address` (`id`),
 CONSTRAINT `customer_ibfk_3` FOREIGN KEY (`characteristic_id`) REFERENCES `customer_characteristic` (`id`),
 CONSTRAINT `customer_ibfk_4` FOREIGN KEY (`create_by`) REFERENCES `users` (`id`) ON UPDATE CASCADE,
 CONSTRAINT `customer_ibfk_5` FOREIGN KEY (`change_by`) REFERENCES `users` (`id`) ON UPDATE CASCADE,
 CONSTRAINT `customer_ibfk_6` FOREIGN KEY (`headquarter_customer_number`) REFERENCES `customer` (`number`) ON UPDATE CASCADE,
 CONSTRAINT `customer_ibfk_7` FOREIGN KEY (`it_location_customer_number`) REFERENCES `customer` (`number`) ON UPDATE CASCADE
 ) ENGINE=InnoDB AUTO_INCREMENT=2540 DEFAULT CHARSET=utf8
asked Mar 18, 2020 at 17:22
4
  • How many rows does customer table have? Also, have you executed analyze on both tables? Commented Mar 18, 2020 at 22:34
  • Please post TEXT results of A) SHOW CREATE TABLE dmsserver_status_log; and B) SHOW CREATE TABLE customer; for analysis. Commented Mar 19, 2020 at 13:48
  • 1
    I've just added requested outputs to my original message. The customer table have approx. 1700 rows Commented Mar 20, 2020 at 9:22
  • Thank you for the Show Create Tables. Commented Mar 21, 2020 at 14:08

3 Answers 3

2

Please provide EXPLAIN SELECT ... for each variant.

What version of MariaDB? (Optimizations have changed in this area.)

Also try these variations

This one isolates the filtering into a derived table, thereby minimizing the need for reaching into the other table.

SELECT log.`timestamp`, log.`level`, log.logger_name, log.message,
 c.`number`
 FROM (
 SELECT log1.customer_id, log1.`timestamp`,
 log1.`level`, log1.logger_name,
 log1.message
 FROM dmsserver_status_log AS log1 
 WHERE `timestamp` > '2020-03-01 00:00:00'
 AND message LIKE '%fehlgeschlagen%'
 ) AS log
 JOIN customer AS c ON c.id = log.customer_id
 ORDER BY log.`timestamp` ASC;

This one defers looking at the second table in a different way.

SELECT `timestamp`, `level`, logger_name, message,
 ( SELECT `number` FROM customer
 WHERE customer.id = log.customer_id ) AS "number"
 FROM dmsserver_status_log as log
 JOIN customer ON customer.id = log.customer_id
 WHERE `timestamp` > '2020-03-01 00:00:00'
 AND message LIKE '%fehlgeschlagen%'
 ORDER BY `timestamp` ASC;

Depending on the version of MySQL/MariaDB, the three variations may be optimized differently.

If you are searching for a "word", consider using FULLTEXT(message) and MATCH(message) AGAINST ('+fehlgeschlagen' IN BOOLEAN MODE). It should run a lot faster.

answered Apr 1, 2020 at 5:15
1

You have individual indices (a.k.a. keys) on customer_id and timestamp in the dmsserver_status_log table. MariaDB can usually only use one index for any given query. So when you change the query to include a join, then MariaDB has to choose one of the indices to use.

What you need instead is a suitable compound index on your dmsserver_status_log table that can support relevant conditions in your WHERE clause, the JOIN and ORDER BY columns at the same time.

So the index you need would probably be CREATE INDEX dmsserver_status_log_idx1 ON (customer_id, timestamp). Note that the order of the columns is important. (If the EXPLAIN shows that the optimizer will not use this index, then you can try to re-create it with the columns in the reverse order. Query optimisation can be difficult ...)

Note that the message column which also appears in your WHERE clause will not benefit from being in the compound index with the particular query as it stands because you have a wildcard before the search string.

You may want to consider removing the existing individual indices on customer_id and timestamp if you only care about good performance on the particular JOIN query. Maintaining a lot of indexes in a table means slower INSERTs and they also take up some memory.

For reference, here's an article about building indices for MariaDB/MySQL: Building the best INDEX for a given SELECT

answered Mar 18, 2020 at 22:07
-1

Martin, The EXPLAIN indicates customer was the first table used, you likely know that you want the LOG to be selected first to get only rows> 2020年03月01日. You should be able to make that happen with SELECT STRAIGHT_JOIN ......
and change

 JOIN customer ON customer.id = log.customer_id

to

 JOIN customer ON log.customer_id = customer.id

Please post new time required with this STRAIGHT_JOIN version.

answered Mar 21, 2020 at 14:06
3
  • 1
    I doubt seriously whether x=y is optimized any differently than y=x. Commented Apr 1, 2020 at 4:39
  • The major problem with STRAIGHT_JOIN is that if later a level is added to the customer, that's all of a sudden a join criteria. It should never be used except in the very extreme cases like identical tables. Commented Apr 1, 2020 at 5:22
  • @martin-fernau, Please post your current query and EXPLAIN EXTENDED test results so we can see what the optimizer did with your query. Thanks Commented Apr 1, 2020 at 11:22

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.