This query takes ~90 seconds:
CREATE TABLE filtered
(
SELECT DISTINCT
f2.cityCode,
f2.productCode,
f2.productDesc,
f2.inventoryDate,
f2.inventoryLevel
FROM
etm
INNER JOIN f2avail AS f2 ON
etm.cityCode = f2.cityCode AND
etm.tvlDate = f2.inventoryDate
);
but the subquery on its own takes ~0.07 seconds:
-- CREATE TABLE filtered
-- (
SELECT DISTINCT
f2.cityCode,
f2.productCode,
f2.productDesc,
f2.inventoryDate,
f2.inventoryLevel
FROM
etm
INNER JOIN f2avail AS f2 ON
etm.cityCode = f2.cityCode AND
etm.tvlDate = f2.inventoryDate
-- );
Why is that?
Following this solution, I ran OPTIMIZE TABLE f2avail
(the one with ~246,000 rows) and the create query ran about 25% faster, but still slow.
How can I speed up saving the subquery results? This is just one step in a series of filtering stages in an internal DB for reporting. Both tables are InnoDB.
Subquery explain
results:
Update
The solution was to use composite indices on both tables.
-- Add the composite indices
ALTER TABLE `etm` ADD INDEX (`cityCode`, `tvlDate`);
ALTER TABLE `f2avail` ADD INDEX (`cityCode`, `inventoryDate`);
The query went from ~90s to about 6s.
2 Answers 2
Your screen images leave out details that would show up in SHOW CREATE TABLE
. But I will make some guesses.
You need, but don't have, a "composite" INDEX(cityCode, tvlDate)
on etm
and/or a similar index on the other table.
Other issues:
- Comparing a
DATE
to aDATETIME
may not work as you expect. If theDATETIME
has any time other than00:00:00
, it will not match anyDATE
. - Why mention
etm
when all the data comes from the other table? - Be consistent on definitions -- I'm looking inconsistent lengths on citycode.
Back to the question... Perhaps nothing was cached in RAM when the 90s run was done. Try each query twice.
-
yes! forgot write about DATE and DATETIMEa_vlad– a_vlad2017年02月23日 02:28:12 +00:00Commented Feb 23, 2017 at 2:28
-
Thank you for replying. Let me address your good points. Only the etm table was created "by design". The other table -
f2avail
- was itself a result of a similar CREATE ... SELECT query, which was itself the result of yet another query on even bigger tables. Seven other similar chained queries take only about ~2s. It's only this one that is slow. All these tables will be deleted. In fact, they will be temporary tables once the speed issue is better addressed. The ~90s is consistent. When I run the optimize query, it then becomes ~60s consistently. These numbers are for the OP query only.Drakes– Drakes2017年02月23日 02:32:39 +00:00Commented Feb 23, 2017 at 2:32 -
Good point about DATE and DATETIME. The schema information you see in the screenshot is a result of a previous CREATE ... SELECT query similar to the one in my question. I have not manually defined the columns as these are to be temporary tables.Drakes– Drakes2017年02月23日 02:34:04 +00:00Commented Feb 23, 2017 at 2:34
-
I ALTERED the tables so both are using DATE, and the column widths are consistent. I also increased SQL memory limits just to be safe. Only a few seconds were shaved.Drakes– Drakes2017年02月23日 06:12:12 +00:00Commented Feb 23, 2017 at 6:12
-
Your comment stating "you need a composite index" on both tables rang a bell. Once I added those to both tables, the query ran in about 6s. Thanks for your help friend.Drakes– Drakes2017年02月23日 06:13:35 +00:00Commented Feb 23, 2017 at 6:13
Looking on Your pictures, I can guess - for run subquery You also use MySQLWorkbench. Most of software - not return to You back all rows from dataset, by default it filter up to 1000 records (some less)
So select without ORDER BY start work and immediately stop after first XXX rows.
In case of create table:
- The subquery return all rows
- Insert operation always longer than SELECT ... much, much longer
Possible or not increase speed of insert? it depends from Your current server settings and configuration - may yes, may be not
Also it depend from other server loading:
- Select it non-locking operations, it start work immediately
- INSERT FROM SELECT (other form of CREATE from SELECT) - lock all data, so if other transactions running on this time, it spend some time for lock tables
Also You can test query with combined indexes on both tables: (cityCode,tvlDate) and (cityCode,inventoryDate)
-
Thank you for your reply. You are right about how Workbench limits results to 1000 rows. Good catch. I added
LIMIT 300000
to return all the results and the SELECT query took only ~10s from ~0.07s.Drakes– Drakes2017年02月23日 02:39:40 +00:00Commented Feb 23, 2017 at 2:39 -
10s vs 90s might be the difference between writing and not writing.Rick James– Rick James2017年02月23日 02:45:39 +00:00Commented Feb 23, 2017 at 2:45
-
You are right about the composite indices on both tables. That was the key factor. Thank you for your help. I appreciate it as well.Drakes– Drakes2017年02月23日 06:14:37 +00:00Commented Feb 23, 2017 at 6:14
filtered
end up with about ~246,000 rows?explain extended
? With the solution above it shows100.0
.filtered
.