I've a Log-Tabelle with approximately 13 million rows. The table structure is (truncated) as follows:
CREATE TABLE `dmsserver_status_log` (
`id` bigint(20) NOT NULL,
`customer_id` int(11) NOT NULL,
`timestamp` datetime(6) NOT NULL,
`level` varchar(10) NOT NULL,
`logger_name` varchar(255) NOT NULL,
`message` text NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
- id is the PK and AUTO_INCREMENT
- customer_id and timestamp have a KEY each
- customer_id has a FOREIGN KEY to the 'customer' table
The customer table structure is (truncated) as follows and have approximately 1700 rows:
CREATE TABLE `customer` (
`id` int(11) NOT NULL,
`number` varchar(10) NOT NULL,
[...]
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
- id is the PK and AUTO_INCREMENT
- number is UNIQUE KEY
There a many more columns and some more keys on these columns but I can't imagine that this is related to my performance problem (?)
If I query my log table (without a join) I receive the result in a short time:
SELECT
`timestamp`,
`level`,
logger_name,
message
FROM dmsserver_status_log as log
WHERE
`timestamp` > '2020-03-01 00:00:00'
AND message LIKE '%fehlgeschlagen%'
ORDER BY
`timestamp` ASC;
In this example I get 57 hits in about a second. This is okay for me. If I expand my select with a JOIN on the customer table the same select needs round about 30 seconds or more to complete:
SELECT
`timestamp`,
`level`,
logger_name,
message,
customer.`number`
FROM dmsserver_status_log as log
JOIN customer ON customer.id = log.customer_id
WHERE
`timestamp` > '2020-03-01 00:00:00'
AND message LIKE '%fehlgeschlagen%'
ORDER BY
`timestamp` ASC;
Depending on my query and the range of the timestamp this query could take several or dozen of minutes.. I just dont understand why this select take so much time to complete.
Here is the 'EXPLAIN EXTENDED' output of obove select with join:
id select_type `table` `type` possible_keys `key` key_len `ref` `rows` filtered Extra
1 SIMPLE customer index PRIMARY number 32 1749 100 Using index; Using temporary; Using filesort
1 SIMPLE log ref customer_id,timestamp customer_id 4 customer.id 515 100 Using where
Could someone explain to me why it takes so much more time to join the second table?
Edit:
As requested here are the Results of SHOW CREATE TABLE dmsserver_status_log
CREATE TABLE `dmsserver_status_log` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`customer_id` int(11) NOT NULL,
`timestamp` datetime(6) NOT NULL,
`thread_name` varchar(255) NOT NULL,
`level` varchar(10) NOT NULL,
`logger_name` varchar(255) NOT NULL,
`source_host` varchar(50) NOT NULL,
`message` text NOT NULL,
`stacktrace` text,
`mdc` text,
PRIMARY KEY (`id`),
KEY `customer_id` (`customer_id`),
KEY `timestamp` (`timestamp`),
CONSTRAINT `dmsserver_status_log_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=17996215 DEFAULT CHARSET=utf8
and here are the Results of SHOW CREATE TABLE customer
CREATE TABLE `customer` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`number` varchar(10) NOT NULL,
`headquarter_customer_number` varchar(10) DEFAULT NULL COMMENT 'Angabe des Hauptbetriebs wenn dies eine Filiale ist',
`it_location_customer_number` varchar(10) DEFAULT NULL COMMENT 'Angabe des EDV-Standorts wenn dieser Kunde keine eigene EDV hat',
`customer_type_id` int(11) NOT NULL DEFAULT '1',
`address_id` int(11) NOT NULL,
`characteristic_id` int(11) NOT NULL DEFAULT '1',
`iban` varchar(34) DEFAULT NULL,
`bic` varchar(11) DEFAULT NULL,
`bank` varchar(100) DEFAULT NULL,
`newsletter` tinyint(1) NOT NULL DEFAULT '1',
`newsletter_email` varchar(100) DEFAULT NULL,
`invoice_email` varchar(100) DEFAULT NULL COMMENT 'Email-Adresse für den Empfang von Rechnungen als PDF',
`api_password` varchar(255) DEFAULT NULL COMMENT 'Zugangspasswort für den Zugriff auf unsere API',
`comments` varchar(250) DEFAULT NULL,
`valid_id` smallint(6) NOT NULL,
`create_time` datetime NOT NULL,
`create_by` int(11) NOT NULL,
`change_time` datetime NOT NULL,
`change_by` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `number` (`number`),
KEY `change_by` (`change_by`),
KEY `characteristic_id` (`characteristic_id`),
KEY `customer_type_id` (`customer_type_id`),
KEY `address_id` (`address_id`),
KEY `valid_id` (`valid_id`),
KEY `create_by` (`create_by`),
KEY `headquarter_customer_number` (`headquarter_customer_number`),
KEY `it_location_customer_number` (`it_location_customer_number`),
CONSTRAINT `customer_ibfk_1` FOREIGN KEY (`customer_type_id`) REFERENCES `customer_type` (`id`),
CONSTRAINT `customer_ibfk_2` FOREIGN KEY (`address_id`) REFERENCES `address` (`id`),
CONSTRAINT `customer_ibfk_3` FOREIGN KEY (`characteristic_id`) REFERENCES `customer_characteristic` (`id`),
CONSTRAINT `customer_ibfk_4` FOREIGN KEY (`create_by`) REFERENCES `users` (`id`) ON UPDATE CASCADE,
CONSTRAINT `customer_ibfk_5` FOREIGN KEY (`change_by`) REFERENCES `users` (`id`) ON UPDATE CASCADE,
CONSTRAINT `customer_ibfk_6` FOREIGN KEY (`headquarter_customer_number`) REFERENCES `customer` (`number`) ON UPDATE CASCADE,
CONSTRAINT `customer_ibfk_7` FOREIGN KEY (`it_location_customer_number`) REFERENCES `customer` (`number`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=2540 DEFAULT CHARSET=utf8
-
How many rows does customer table have? Also, have you executed analyze on both tables?dbilid– dbilid2020年03月18日 22:34:54 +00:00Commented Mar 18, 2020 at 22:34
-
Please post TEXT results of A) SHOW CREATE TABLE dmsserver_status_log; and B) SHOW CREATE TABLE customer; for analysis.Wilson Hauck– Wilson Hauck2020年03月19日 13:48:28 +00:00Commented Mar 19, 2020 at 13:48
-
1I've just added requested outputs to my original message. The customer table have approx. 1700 rowsMartin Fernau– Martin Fernau2020年03月20日 09:22:13 +00:00Commented Mar 20, 2020 at 9:22
-
Thank you for the Show Create Tables.Wilson Hauck– Wilson Hauck2020年03月21日 14:08:53 +00:00Commented Mar 21, 2020 at 14:08
3 Answers 3
Please provide EXPLAIN SELECT ...
for each variant.
What version of MariaDB? (Optimizations have changed in this area.)
Also try these variations
This one isolates the filtering into a derived table, thereby minimizing the need for reaching into the other table.
SELECT log.`timestamp`, log.`level`, log.logger_name, log.message,
c.`number`
FROM (
SELECT log1.customer_id, log1.`timestamp`,
log1.`level`, log1.logger_name,
log1.message
FROM dmsserver_status_log AS log1
WHERE `timestamp` > '2020-03-01 00:00:00'
AND message LIKE '%fehlgeschlagen%'
) AS log
JOIN customer AS c ON c.id = log.customer_id
ORDER BY log.`timestamp` ASC;
This one defers looking at the second table in a different way.
SELECT `timestamp`, `level`, logger_name, message,
( SELECT `number` FROM customer
WHERE customer.id = log.customer_id ) AS "number"
FROM dmsserver_status_log as log
JOIN customer ON customer.id = log.customer_id
WHERE `timestamp` > '2020-03-01 00:00:00'
AND message LIKE '%fehlgeschlagen%'
ORDER BY `timestamp` ASC;
Depending on the version of MySQL/MariaDB, the three variations may be optimized differently.
If you are searching for a "word", consider using FULLTEXT(message)
and MATCH(message) AGAINST ('+fehlgeschlagen' IN BOOLEAN MODE)
. It should run a lot faster.
You have individual indices (a.k.a. keys) on customer_id
and timestamp
in the dmsserver_status_log
table. MariaDB can usually only use one index for any given query. So when you change the query to include a join, then MariaDB has to choose one of the indices to use.
What you need instead is a suitable compound index on your dmsserver_status_log
table that can support relevant conditions in your WHERE
clause, the JOIN
and ORDER BY
columns at the same time.
So the index you need would probably be CREATE INDEX dmsserver_status_log_idx1 ON (customer_id, timestamp)
. Note that the order of the columns is important. (If the EXPLAIN
shows that the optimizer will not use this index, then you can try to re-create it with the columns in the reverse order. Query optimisation can be difficult ...)
Note that the message
column which also appears in your WHERE
clause will not benefit from being in the compound index with the particular query as it stands because you have a wildcard before the search string.
You may want to consider removing the existing individual indices on customer_id
and timestamp
if you only care about good performance on the particular JOIN query. Maintaining a lot of indexes in a table means slower INSERT
s and they also take up some memory.
For reference, here's an article about building indices for MariaDB/MySQL: Building the best INDEX for a given SELECT
Martin,
The EXPLAIN indicates customer was the first table used, you likely know that you want the LOG to be selected first to get only rows> 2020年03月01日. You should be able to make that happen with SELECT STRAIGHT_JOIN ......
and change
JOIN customer ON customer.id = log.customer_id
to
JOIN customer ON log.customer_id = customer.id
Please post new time required with this STRAIGHT_JOIN version.
-
1I doubt seriously whether x=y is optimized any differently than y=x.Rick James– Rick James2020年04月01日 04:39:20 +00:00Commented Apr 1, 2020 at 4:39
-
The major problem with
STRAIGHT_JOIN
is that if later alevel
is added to thecustomer
, that's all of a sudden a join criteria. It should never be used except in the very extreme cases like identical tables.danblack– danblack2020年04月01日 05:22:53 +00:00Commented Apr 1, 2020 at 5:22 -
@martin-fernau, Please post your current query and EXPLAIN EXTENDED test results so we can see what the optimizer did with your query. ThanksWilson Hauck– Wilson Hauck2020年04月01日 11:22:56 +00:00Commented Apr 1, 2020 at 11:22