I have a query that I feel should go much faster then it does. Right now it'll execute around 1,5xx secs each time. I don't have any advanced knowledge of looking at the EXPLAIN and draw conclusions. Maybe someone can help?
The query has all its ID-fields indexed and unsigned.
"reservations" contains around 100,000 records and "reservation_nights" around 180,000 records. All other tables have under 1000 records each.
"profiles" is LEFT JOIN:ed because not all reservations has profiles.
Here's the query:
SELECT DISTINCT(reservations.id), `reservations`.`referenceId`, `reservations`.`referenceStatus`, `reservations`.`status`, `reservations`.`firstname`, `reservations`.`lastname`, `reservations`.`dateArrival`, `reservations`.`daysToCheckIn`, `reservations`.`rateCode`, `reservations`.`roomCode`, `reservations`.`totalNights`, `reservations`.`rateRevenue`, `reservations`.`dailyRate`, `reservations`.`totalRevenue`, `reservations`.`foodBeverageRevenue`, `reservations`.`additionalRevenue`, `channels`.`name` AS `channelName`, `channelgroups`.`name` AS `channelGroupName`, `markets`.`name` AS `marketName`, `profiles`.`totalReservations`
FROM `reservation_nights`
INNER JOIN `reservations` ON reservations.id = reservation_nights.reservationId
INNER JOIN `channels` ON channels.id = reservations.channelId
INNER JOIN `channelgroups` ON channelgroups.id = channels.channelgroupId
INNER JOIN `markets` ON markets.id = reservations.marketId
LEFT JOIN `profiles` ON profiles.id = reservations.profileId
ORDER BY `reservation_nights`.`dateStayed` DESC
LIMIT 100
Attached is the EXPLAIN from PHPMYADMIN. Where should I start optimizing this query?
enter image description here
1 Answer 1
First, please remove the parentheses from
DISTINCT(reservations.id)
.
DISTINCT
is not a function.About performance, why do you join
reservation_nights
? You only seem to use that table in theORDER BY
. Assuming that a reservation can span several nights, you probably mean to use the last date (from the many that a reservation has) for the ordering and nothing else. This however, requires a join onreservation_nights
and thus many rows for each reservation are created in the result set. So, you had to useDISTINCT
to get rid of the duplicate rows.DISTINCT
is often inefficient, especially when performed for big results sets and not in small/indexed tables.
Try this rewriting, that gets rid of the DISTINCT
and pushes the ORDER BY / LIMIT 100
in a derived table where only base table is used. This technique which I call "LIMIT
first, JOIN
later" can be more efficient as the order by is done on less data (single table vs a large join) and then the joins are done on less data as well (max 100 rows from each of the joined tables):
SELECT
r.id,
r.referenceId, r.referenceStatus, r.status,
r.firstname, r.lastname,
r.dateArrival, r.daysToCheckIn,
r.rateCode, r.roomCode,
r.totalNights, r.rateRevenue,
r.dailyRate, r.totalRevenue,
r.foodBeverageRevenue, r.additionalRevenue,
c.name AS channelName,
cg.name AS channelGroupName,
m.name AS marketName,
p.totalReservations
FROM
( SELECT reservationId, MAX(dateStayed) AS lastDateStayed
FROM reservation_nights
GROUP BY reservationId
ORDER BY MAX(dateStayed) DESC
LIMIT 100
) AS rn
INNER JOIN reservations AS r ON r.id = rn.reservationId
INNER JOIN channels AS c ON c.id = r.channelId
INNER JOIN channelgroups AS cg ON cg.id = c.channelgroupId
INNER JOIN markets AS m ON m.id = r.marketId
LEFT JOIN profiles AS p ON p.id = r.profileId
ORDER BY
rn.lastDateStayed DESC ;
- If you don't already have one, add an index on
reservation_nights (reservationId, dateStayed)
to improve the efficiency of the subquery.
-
Wow, that's a great technique. I'll try this first thing tomorrow and report back with the result! Reason for the "reservation_nights" is as said the reservation itself has multiple nights and the user can select to search for reservations within a date-range (a WHERE statement I'll just add to the sub-query I assume to first fetch the records in my LIMIT?). Great stuff!naboovalley– naboovalley2015年02月12日 16:43:40 +00:00Commented Feb 12, 2015 at 16:43
-
100 total, Query took 0.3591 sec. Great improvement. I have an index on "reservation_nights.reservationId". I added another to the "dateStayed" column but the query took almost exactly the same time. Is there anything specific to gain to add these to in the same index instead of two separate (like I have now)?naboovalley– naboovalley2015年02月12日 16:50:51 +00:00Commented Feb 12, 2015 at 16:50
-
I meant a composite index, on:
(reservationId, dateStayed)
Try that. It's different than 2 indexes.ypercubeᵀᴹ– ypercubeᵀᴹ2015年02月12日 17:44:10 +00:00Commented Feb 12, 2015 at 17:44 -
Hey @ypercube since I need to run WHERE-statements on columns in the "reservations" table should I add another JOIN to the "reservation_nights" LIMIT-query within the FROM subquery?naboovalley– naboovalley2015年02月13日 07:50:44 +00:00Commented Feb 13, 2015 at 7:50
-
Yes, but it won't be very good for performance. I'll add some more notes about this when I find time.ypercubeᵀᴹ– ypercubeᵀᴹ2015年02月13日 10:39:47 +00:00Commented Feb 13, 2015 at 10:39
Explore related questions
See similar questions with these tags.
DISTINCT(reservations.id),
.DISTINCT
is not a function.