I have the following tables:
CREATE TABLE `events` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`status` varchar(255) COLLATE utf8_unicode_ci DEFAULT 'draft',
`publish_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_events_on_status` (`status`),
KEY `index_events_on_status_and_publish_at` (`status`,`publish_at`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=32092 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `event_dates` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`start_date` date NOT NULL,
`end_date` date DEFAULT NULL,
`event_id` int(11) NOT NULL,
`venue_id` int(11) DEFAULT NULL,
`alt_venue_id` int(11) DEFAULT NULL,
`allday` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `event_dates_event_id_fk` (`event_id`),
KEY `event_dates_venue_id_fk` (`venue_id`),
KEY `event_dates_alt_venue_id_fk` (`alt_venue_id`),
KEY `index_event_dates_on_start_date` (`start_date`),
KEY `index_event_dates_on_end_date` (`end_date`),
KEY `index_event_dates_on_start_date_and_end_date` (`start_date`,`end_date`),
CONSTRAINT `event_dates_alt_venue_id_fk` FOREIGN KEY (`alt_venue_id`) REFERENCES `venues` (`id`) ON DELETE SET NULL,
CONSTRAINT `event_dates_event_id_fk` FOREIGN KEY (`event_id`) REFERENCES `events` (`id`) ON DELETE CASCADE,
CONSTRAINT `event_dates_venue_id_fk` FOREIGN KEY (`venue_id`) REFERENCES `venues` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=152634 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `event_times` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`time_start` time NOT NULL,
`time_end` time DEFAULT NULL,
`date_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `event_times_date_id_fk` (`date_id`),
CONSTRAINT `event_times_date_id_fk` FOREIGN KEY (`date_id`) REFERENCES `event_dates` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=66391 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `venues` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`city_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `venues_city_id_fk` (`city_id`),
KEY `venues_to_cities` (`city_id`),
CONSTRAINT `venues_to_cities` FOREIGN KEY (`city_id`) REFERENCES `cities` (`id`) ON DELETE SET NULL ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=3083 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Events can happen multiple times (even more than once per day) and in different venues (the table event_dates is used to define this concept).
The basic use-case in my application is to get the list of events that are going to happen after a certain date (e.g. show me all events form today onwards).
I use the following query to get the relevant ids of the matching records.
Better said, this query outputs all events occurrences happening from a given date onwards, where an occurrences is just a event_date (if the event takes place only once on the event_date) or a tuple [event_date, event_time] (if the event takes place multiple times on the same event_date).
SELECT
events.id AS event_id,
CONCAT(event_dates.id, '::', IF(event_times.id IS NULL, '', event_times.id)) AS occurrence_id,
venues.id AS venue_id, alt_venues.id AS alt_venue_id,
event_dates.start_date AS event_dates_start_date, event_dates.allday AS event_dates_allday, event_times.time_start AS event_times_time_start
FROM `event_dates`
INNER JOIN `events` ON `events`.`id` = `event_dates`.`event_id`
LEFT JOIN event_times ON event_times.date_id = event_dates.id
INNER JOIN venues ON event_dates.venue_id = venues.id
LEFT JOIN venues AS alt_venues ON event_dates.alt_venue_id = alt_venues.id
WHERE
(events.status = 'published' AND (events.publish_at IS NULL OR events.publish_at <= NOW()))
AND (1 = 1) AND (1 = 1) AND
(
('2015-06-27' <= event_dates.start_date)
OR
(event_dates.end_date is NOT NULL AND '2015-06-27' <= event_dates.end_date)
)
LIMIT 0, 100000
MySQL takes ~40ms to return 8.4K rows.
The output of EXPLAIN EXTENDED is:
+----+-------------+-------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+---------+-----------------------------------------+------+----------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+---------+-----------------------------------------+------+----------+------------------------------------+
| 1 | SIMPLE | events | ref | PRIMARY,index_events_on_status,index_events_on_status_and_publish_at | index_events_on_status | 768 | const | 7467 | 100.00 | Using index condition; Using where |
| 1 | SIMPLE | event_dates | ref | event_dates_event_id_fk,event_dates_venue_id_fk,index_event_dates_on_start_date,index_event_dates_on_end_date,index_event_dates_on_start_date_and_end_date | event_dates_event_id_fk | 4 | agenda_staging.events.id | 1 | 100.00 | Using where |
| 1 | SIMPLE | venues | eq_ref | PRIMARY | PRIMARY | 4 | agenda_staging.event_dates.venue_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | alt_venues | eq_ref | PRIMARY | PRIMARY | 4 | agenda_staging.event_dates.alt_venue_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | event_times | ref | event_times_date_id_fk | event_times_date_id_fk | 4 | agenda_staging.event_dates.id | 1 | 100.00 | NULL |
+----+-------------+-------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+---------+-----------------------------------------+------+----------+------------------------------------+
I'm not a DB expert at all, but looking at type and extra columns I understand that MySQL is properly using my indexes.
However, in all the perf-related material I found googling, 8K is a very tiny result-set.
Given that the hardware I'm on is a 4-core SSD-based MacBook Pro, I believe that it should be an order of magnitude faster.
40ms alone would not be a problem.
However in the application I need data from various other tables for every event. I fetch that data from the following query, which uses the one above as a subquery.
SELECT
dative.occurrence_id, dative.event_id, dative.venue_id, dative.alt_venue_id, images.id AS image_id,
GROUP_CONCAT(DISTINCT event_categories_events.event_category_id ORDER BY event_category_id) AS event_category_ids,
COUNT(DISTINCT events_promoters.promoter_id) AS promoters_count,
GROUP_CONCAT(DISTINCT event_texts.id) AS event_text_ids,
GROUP_CONCAT(DISTINCT communications.id ORDER BY communications.id) AS communication_ids,
GROUP_CONCAT(DISTINCT events_tags.tag_id ORDER BY tag_id) AS tag_ids,
GROUP_CONCAT(DISTINCT collection_eds_events.collection_edition_id ORDER BY collection_edition_id) AS collection_edition_ids
FROM (
SELECT
events.id AS event_id,
CONCAT(event_dates.id, '::', IF(event_times.id IS NULL, '', event_times.id)) AS occurrence_id,
venues.id AS venue_id, alt_venues.id AS alt_venue_id
, event_dates.start_date AS event_dates_start_date, event_dates.allday AS event_dates_allday, event_times.time_start AS event_times_time_start
FROM `event_dates`
INNER JOIN `events` ON `events`.`id` = `event_dates`.`event_id`
LEFT JOIN event_times ON event_times.date_id = event_dates.id
INNER JOIN venues ON event_dates.venue_id = venues.id
LEFT JOIN venues AS alt_venues ON event_dates.alt_venue_id = alt_venues.id
WHERE
(events.status = 'published' AND (events.publish_at IS NULL OR events.publish_at <= NOW()))
AND (1 = 1) AND (1 = 1) AND
(
('2015-06-27' <= event_dates.start_date)
OR
(event_dates.end_date is NOT NULL AND '2015-06-27' <= event_dates.end_date)
)
)
AS dative
INNER JOIN events ON events.id = dative.event_id
INNER JOIN `product_eds_events` ON `product_eds_events`.`event_id` = `events`.`id`
INNER JOIN `product_editions` ON `product_editions`.`id` = `product_eds_events`.`product_edition_id`
INNER JOIN `images` ON `images`.`event_id` = `events`.`id`
INNER JOIN event_texts ON event_texts.event_id = events.id AND event_texts.language = 'it'
LEFT JOIN collection_eds_events ON collection_eds_events.event_id = events.id
LEFT JOIN collection_editions ON collection_editions.id = collection_eds_events.collection_edition_id
LEFT JOIN communications ON communications.event_id = events.id AND communications.status = "published" AND communications.publish_at <= NOW() AND communications.expire_at >= CURDATE()
INNER JOIN event_categories_events ON event_categories_events.event_id = events.id
INNER JOIN events_promoters ON events_promoters.event_id = events.id
LEFT JOIN events_tags ON events_tags.event_id = events.id
WHERE `product_editions`.`product_id` = 1
GROUP BY dative.occurrence_id
ORDER BY event_dates_start_date ASC, event_dates_allday ASC, event_times_time_start ASC
LIMIT 0, 100000
This query takes 340ms to return 8.2K rows.
And this is the EXPLAIN EXTENDED output:
+----+-------------+-------------------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+---------+------------------------------------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+---------+------------------------------------------------------------+------+----------+----------------------------------------------+
| 1 | PRIMARY | product_editions | ref | PRIMARY,product_editions_product_id_fk | product_editions_product_id_fk | 5 | const | 4 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | PRIMARY | product_eds_events | ref | index_unique_product_editions_events,product_eds_events_event_id_fk | index_unique_product_editions_events | 5 | agenda_staging.product_editions.id | 1059 | 100.00 | Using where; Using index |
| 1 | PRIMARY | event_texts | ref | index_event_texts_on_event_id_and_language | index_event_texts_on_event_id_and_language | 773 | agenda_staging.product_eds_events.event_id,const | 1 | 100.00 | Using where; Using index |
| 1 | PRIMARY | events | eq_ref | PRIMARY | PRIMARY | 4 | agenda_staging.product_eds_events.event_id | 1 | 100.00 | Using index |
| 1 | PRIMARY | event_categories_events | ref | event_categories_events_event_id_fk | event_categories_events_event_id_fk | 5 | agenda_staging.product_eds_events.event_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | images | ref | images_event_id_fk | images_event_id_fk | 5 | agenda_staging.product_eds_events.event_id | 1 | 100.00 | Using index |
| 1 | PRIMARY | events_promoters | ref | index_events_promoters_on_event_id_and_promoter_id | index_events_promoters_on_event_id_and_promoter_id | 4 | agenda_staging.product_eds_events.event_id | 1 | 100.00 | Using index |
| 1 | PRIMARY | collection_eds_events | ref | collection_eds_events_event_id_fk | collection_eds_events_event_id_fk | 5 | agenda_staging.product_eds_events.event_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | collection_editions | eq_ref | PRIMARY | PRIMARY | 4 | agenda_staging.collection_eds_events.collection_edition_id | 1 | 100.00 | Using index |
| 1 | PRIMARY | communications | ref | communications_event_id_fk | communications_event_id_fk | 5 | agenda_staging.product_eds_events.event_id | 1 | 100.00 | Using where |
| 1 | PRIMARY | events_tags | ref | index_events_tags_on_event_id_and_tag_id | index_events_tags_on_event_id_and_tag_id | 4 | agenda_staging.product_eds_events.event_id | 1 | 100.00 | Using index |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 4 | agenda_staging.product_eds_events.event_id | 10 | 100.00 | NULL |
| 2 | DERIVED | events | ref | PRIMARY,index_events_on_status,index_events_on_status_and_publish_at | index_events_on_status | 768 | const | 7467 | 100.00 | Using index condition; Using where |
| 2 | DERIVED | event_dates | ref | event_dates_event_id_fk,event_dates_venue_id_fk,index_event_dates_on_start_date,index_event_dates_on_end_date,index_event_dates_on_start_date_and_end_date | event_dates_event_id_fk | 4 | agenda_staging.events.id | 1 | 100.00 | Using where |
| 2 | DERIVED | venues | eq_ref | PRIMARY | PRIMARY | 4 | agenda_staging.event_dates.venue_id | 1 | 100.00 | Using index |
| 2 | DERIVED | alt_venues | eq_ref | PRIMARY | PRIMARY | 4 | agenda_staging.event_dates.alt_venue_id | 1 | 100.00 | Using index |
| 2 | DERIVED | event_times | ref | event_times_date_id_fk | event_times_date_id_fk | 4 | agenda_staging.event_dates.id | 1 | 100.00 | NULL |
+----+-------------+-------------------------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------+---------+------------------------------------------------------------+------+----------+----------------------------------------------+
Again, it seems to me that the indexes are being used.
Am I missing some obvious issue here? Or are these timings to be expected?
All tables are InnoDB with indexed foreign keys.
I am sorry for the very long question.
I did not include the DDL for the tables in the second query for brevity's sake.
If they can of help, I will edit my question with a like to a pastebin with the complete DDL.
Please I need help to solve this cause I won't get acceptable response times for my webapp otherwise.
1 Answer 1
- Why have
LIMIT 10000
? - Can
status
be turned into anENUM
(1 byte) instead of a potentially largeVARCHAR(255)
? - It would probably be better to combine date and time. Joining to the 'time' table seems to add to the complexity and hurt performance.
- It is essentially impossible to deal with (that is, use an index for) a start-end date (or datetime) range. Consider having one row per day, instead of one row for a range of days. Sure, that increases the number of rows in the table, but it will probably greatly decrease the time taken for certain queries.
- What does
NULL
mean forend_time
? Forend_date
? - Perhaps
published_at IS NOT NULL
is equivalent tostatus = 'published'
? product_eds_events
has a large value forRows
; please tell us what the table is about, and provideSHOW CREATE TABLE
.- It may be better to change certain
JOINs
into subqueries; see below. - Some of the tables smell like "many-to-many mapping" tables. See this for several optimization tips.
below...
SELECT ...,
GROUP_CONCAT(DISTINCT event_categories_events.event_category_id
ORDER BY event_category_id) AS event_category_ids,
...
INNER JOIN event_categories_events
ON event_categories_events.event_id = events.id
...
The reason for this is... You have an "explode-implode" caused by JOIN
plus GROUPing
. First, the JOIN
explodes the number of 'rows' and builds a temporary table for them. Then the GROUPing
(GROUP_CONCAT
, in this case), implodes. The better (I hope) approach would be:
SELECT ...,
( SELECT GROUP_CONCAT(
DISTINCT event_category_id
ORDER BY event_category_id)
FROM event_categories_events
WHERE event_id = events.id
) AS event_category_ids,
...
-- no JOIN with event_categories_events
...
That particular subquery needs INDEX(event_id, event_category_id)
in this order.
(Apply most of my suggestions, then start a fresh Question with fresh CREATE TABLEs
, EXPLAINs
, etc. Augmenting this Question could get messy and hard to follow.)
Explore related questions
See similar questions with these tags.
GROUP BY
. Why is group by needed? Perhaps the subquery (ie the first query) does not return unique events or unique event_dates, whatever its purpose is?GROUP BY
is there for the second query, because the event can have multiple promoters or multiple communications or be part of multiple collections, etc.... Can you elaborate on whyGROUP BY
could be problematic?