I have this table:
CREATE TABLE `property_ads_history` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `property_ad_id` int(10) unsigned NOT NULL, `advertiser_type` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `agency_reference_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `average_sale_price` double(8,2) NOT NULL DEFAULT '-1.00', `bathrooms` double(8,2) NOT NULL DEFAULT '-1.00', `bedrooms` double(8,2) NOT NULL DEFAULT '-1.00', `carports` double(8,2) NOT NULL DEFAULT '-1.00', `DELETE_country` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `created_reason` enum('Geocoded','Sanitized Parking','Sanitized Representation','Sanitized Address','Scraped','URL Inserted','QA Sanitized Address','QA Sanitized Representation','QA Sanitized Parkings') COLLATE utf8_unicode_ci DEFAULT NULL, `description` longtext COLLATE utf8_unicode_ci NOT NULL, `ensuite_bathrooms` double(8,2) NOT NULL DEFAULT '-1.00', `DELETE_ad_expired_at` datetime NOT NULL, `floor_area` double(8,2) NOT NULL DEFAULT '-1.00', `formatted_address` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `garages` double(8,2) NOT NULL DEFAULT '-1.00', `geocode_status` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `is_represented` tinyint(1) DEFAULT NULL, `land_area` double(8,2) NOT NULL DEFAULT '-1.00', `latitude` double(10,6) NOT NULL, `location_id` int(10) unsigned DEFAULT NULL, `longitude` double(10,6) NOT NULL, `off_street_parkings` double(8,2) NOT NULL DEFAULT '-1.00', `official_property_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `page_url` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `parking` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `posted_at` datetime NOT NULL, `posted_at_string` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `postal_code` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `price` double(10,2) NOT NULL DEFAULT '-1.00', `primary_image` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `DELETE_property_ad_created_at` datetime NOT NULL, `property_type` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `rateable_value` double(10,2) NOT NULL DEFAULT '-1.00', `recent_sale_1` double(10,2) NOT NULL DEFAULT '-1.00', `recent_sale_2` double(10,2) NOT NULL DEFAULT '-1.00', `recent_sale_3` double(10,2) NOT NULL DEFAULT '-1.00', `reviewer_comments` longtext COLLATE utf8_unicode_ci NOT NULL, `route` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `source_id` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `street_number` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `title` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `user_id` int(10) unsigned DEFAULT NULL, `user_provided_address` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `deleted_at` timestamp NULL DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `unique_index` (`created_at`,`created_reason`,`source_id`), KEY `property_ads_history_property_ad_id_foreign` (`property_ad_id`), KEY `property_ads_history_location_id_foreign` (`location_id`), KEY `created_reason` (`created_reason`) );
This is my SQL query:
SELECT * FROM `property_ads_history` `t1`
WHERE `t1`.`created_at` >= '2016-04-13 00:00:00'
AND `t1`.`created_reason` = 'Scraped'
AND (`t1`.`price` > -1 OR `t1`.`price` <> 999999.99)
AND (
SELECT `t2`.`price` FROM `property_ads_history` `t2`
WHERE `t2`.`property_ad_id` = `t1`.`property_ad_id`
AND `t2`.`created_at` < `t1`.`created_at`
AND (`t2`.`price` > -1 OR `t2`.`price` <> 999999.99) ORDER BY
DATE(`t2`.`created_at`) DESC LIMIT 1
) <> `t1`.`price` GROUP BY `t1`.`property_ad_id`;
All I want in that query are:
I want to get the records that meet certain criteria, particularly that the
created_at
field is within the last 24 hoursI need to get the record that immediately precedes the records in #1
further filter results in #1 to records whose price column has had a different value in a record's history, other than -1
The problem I'm facing is that the query is too slow. I have a million records. It would take more or less 2 mins for the query to execute. I'm guessing GROUP BY makes the query slow. I think I need to make a composite index but I'm not sure how to do it.
Explain result:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | PRIMARY | t1 | ref | unique_index,created_reason | created_reason | 2 | const | 590030 | Using where; Using temporary; Using filesort
2 | DEPENDENT SUBQUERY | t2 | ref | unique_index,property_ads_history_property_ad_id_foreign | property_ads_history_property_ad_id_foreign | 4 | reis_real_estate.t1.property_ad_id | 7 | Using where; Using filesort
2 Answers 2
ORDER BY
in the sub query is what's making it slow.
ORDER BY
makes query slow since you need to order (sort) all the objects in the query results. It is in order of N Square if you don't have an index, and N log N if you do.
And you don't even need to order by all of it.
What you need is only to filter out the minimum, so instead of:
ORDER BY
DATE(`t2`.`created_at`) DESC LIMIT 1
I would add another condition in the WHERE
clause saying:
WHERE DATE(t2.'created at') <> (SELECT MAX (DATE ('created at') FROM 't2'))
Since it needs to look at all the dates in the query, but not necessarily to order them.
Google "Create Composite Index". MySQL is not my variant of SQL. It is always good practice to check subjects like indexes and Primary Keys outside of Q&A forums (like this one) since they are very broad and you need to read at least 5 articles in order to decide which composite index you need, on top of how to write it (but since you query that field quite often in that query, I'd suggest you should create one).
-
2The
ORDER BY
is not always a problem. It might be worth checking but there are many other things worth checking in the query. TheORDER BY DATE(column)
is probably one reason - of the many - for the slow query.ypercubeᵀᴹ– ypercubeᵀᴹ2016年04月28日 11:21:58 +00:00Commented Apr 28, 2016 at 11:21 -
2I'm not sure what you mean by your WHERE clause suggestion.
ORDER BY A DESC LIMIT 1
would give you the last item (or at least one of the last items), whereasWHERE A <> MAX(A)
basically means "all except the last item(s)". Perhaps it should be=
instead of<>
– still not an exact equivalent of the ORDER BY but certainly closer.Andriy M– Andriy M2016年04月28日 14:25:57 +00:00Commented Apr 28, 2016 at 14:25 -
1
<> SELECT MAX (DATE ('created at') FROM 't2')
is not valid syntax. Perhaps you meant:<> (SELECT MAX(DATE(created at)) FROM t2)
ypercubeᵀᴹ– ypercubeᵀᴹ2016年04月28日 15:25:34 +00:00Commented Apr 28, 2016 at 15:25 -
ypercube, correct, updated.Hila DG– Hila DG2016年04月28日 22:54:03 +00:00Commented Apr 28, 2016 at 22:54
You'll speed-up the query significantly if you create an index (ideally a unique index) on property_ad_id
and created_at
, and another on just created_at
:
CREATE INDEX ON property_ads_history (property_ad_id, created_at);
CREATE INDEX ON property_ads_history (created_at); # if not already there!
Note that the condition t1.price > -1 OR t1.price <> 999999.99
will always be true. You may want to either remove it or fix its logic (you likely wanted an AND instead of an OR).
Maybe using a temporary table can increase the speed of the query even further:
CREATE TEMPORARY TABLE myads (
pos int NOT NULL AUTO_INCREMENT PRIMARY KEY,
property_ad_id int(10) unsigned NOT NULL,
id int(10) unsigned not null,
UNIQUE (property_ad_id, pos));
INSERT INTO myads (id)
SELECT id
FROM property_ads_history
WHERE property_ad_id IN (
SELECT property_ad_id
FROM property_ads_history
WHERE created_at >= '2016-04-13 00:00:00' AND
created_reason = 'Scraped' AND
t1.price > -1 AND t1.price <> 999999.99)
ORDER BY property_ad_id, created_at;
SELECT ch.*
FROM myads cur
JOIN property_ads_history ch ON (ch.id = cur.id)
LEFT JOIN myads pre ON (pre.property_ad_id = cur.property_ad_id AND pre.pos = cur.pos-1) # this is the previous ad
LEFT JOIN property_ads_history ph ON (ph.id = pre.id)
WHERE ch.created_at >= '2016-04-13 00:00:00' AND
ch.price <> ph.price;
DROP TABLE myads;
Explore related questions
See similar questions with these tags.
(price > -1 OR price <> 999999.99)
condition which appears twice? Seems useless.