2

Could you please help me to optimize this query - currently it takes ~4 minutes to run this query. It looks like keys: ip and timestamp are non-unique so shall I add some indexes?

mysql> EXPLAIN
 SELECT max(url) as url, 
 max(title) as title, 
 keyword as keyword,
 COALESCE(max(domain), 'example.com') as domain 
 FROM `test_url` as tu 
 JOIN `test_log` AS tl 
 ON tl.shorturl = tu.keyword 
 AND tl.click_id > 
 (SELECT option_value 
 from `test_options` 
 WHERE option_name = 'click_id' ) 
 WHERE 1 = 1 
 GROUP BY keyword 
 HAVING count( keyword ) > 50 
 ORDER BY count( keyword ) DESC 
 LIMIT 10;
+----+-------------+--------------+--------+------------------+-------------+---------+-----------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+------------------+-------------+---------+-----------------------+---------+----------------------------------------------+
| 1 | PRIMARY | tl | range | PRIMARY,shorturl | PRIMARY | 4 | NULL | 5748586 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | tu | eq_ref | PRIMARY | PRIMARY | 602 | test_urls.tl.shorturl | 1 | |
| 2 | SUBQUERY | test_options | ref | option_name | option_name | 194 | | 1 | Using where |
+----+-------------+--------------+--------+------------------+-------------+---------+-----------------------+---------+----------------------------------------------+
3 rows in set (0.00 sec)

-

mysql> SHOW CREATE TABLE test_url\G
*************************** 1. row ***************************
 Table: test_url
Create Table: CREATE TABLE `test_url` (
 `keyword` varchar(200) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
 `url` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
 `title` text,
 `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 `ip` varchar(41) NOT NULL,
 `clicks` int(10) unsigned NOT NULL,
 `username` varchar(255) NOT NULL DEFAULT '',
 `company_id` int(11) NOT NULL DEFAULT '0',
 `domain` varchar(31) NOT NULL DEFAULT '',
 PRIMARY KEY (`keyword`),
 KEY `timestamp` (`timestamp`),
 KEY `ip` (`ip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

-

mysql> SHOW INDEXES FROM test_url;
+----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test_url | 0 | PRIMARY | 1 | keyword | A | 176798 | NULL | NULL | | BTREE | | |
| test_url | 1 | timestamp | 1 | timestamp | A | 176798 | NULL | NULL | | BTREE | | |
| test_url | 1 | ip | 1 | ip | A | 11 | NULL | NULL | | BTREE | | |
+----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.08 sec)

Edit: Tue Jun 17 15:17:36 BST 2014

mysql> SHOW CREATE TABLE test_log\G
*************************** 1. row ***************************
 Table: test_log
Create Table: CREATE TABLE `test_log` (
 `click_id` int(11) NOT NULL AUTO_INCREMENT,
 `click_time` datetime NOT NULL,
 `shorturl` varchar(200) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
 `referrer` varchar(200) NOT NULL,
 `user_agent` varchar(255) NOT NULL,
 `ip_address` varchar(41) NOT NULL,
 `country_code` char(2) NOT NULL,
 PRIMARY KEY (`click_id`),
 KEY `shorturl` (`shorturl`)
) ENGINE=InnoDB AUTO_INCREMENT=51636348 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

-

mysql> SHOW CREATE TABLE test_options\G
*************************** 1. row ***************************
 Table: test_options
Create Table: CREATE TABLE `test_options` (
 `option_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
 `option_name` varchar(64) NOT NULL DEFAULT '',
 `option_value` longtext NOT NULL,
 PRIMARY KEY (`option_id`,`option_name`),
 KEY `option_name` (`option_name`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

Edit: Fri Jun 20 10:40:35 BST 2014

Database size is 4975MB (3965MB data size + 1010MB index size)

MySQL veriosn: 5.5.35 OS: Ubuntu 12.04

This is basically this application: http://yourls.org/

The main problem is that this query is causing a high load on the server using temporary tables created on disk (State: Copying to tmp table)

ypercubeTM
99.7k13 gold badges217 silver badges306 bronze badges
asked Jun 17, 2014 at 12:36
11
  • Add the SHOW CREATE TABLE test_log; and test_options output. Commented Jun 17, 2014 at 13:22
  • Please see my edit: Edit: Tue Jun 17 15:17:36 BST 2014 Commented Jun 17, 2014 at 14:20
  • 2
    Probably unrelated to the query's efficiency but you have a tl.click_id > option_value comparison between an INT and a LONGTEXT. Commented Jun 17, 2014 at 14:24
  • how big are the tables? Commented Jun 19, 2014 at 13:50
  • 4
    Why do you have WHERE 1 = 1 ? Commented Jun 19, 2014 at 15:12

4 Answers 4

2

I bear bad news. I've studied your tables and your query and looked at many variations on them, ran them against a synthetic data set approximately the same size as yours. And I'm sorry to report that I'm 98% sure that your query is nearly optimal and that you already have the right indexes in place. I welcome anyone who can disprove this and come up with additional indexes + rewritten sql that can significantly improve performance.

Aggregation != Searching. You're doing OLAP. Your query seems to be doing aggregation over about 5 million rows. Normally if I see a query doing a full table scan or a wide index range scan, I would suspect that indexes might not be optimal unless (big unless here) you're doing an aggregation (such as COUNT, SUM, AVG, etc), then there's no way around accessing all those rows since you need to compute on them.

I don't believe you are going to get significantly quicker results simply by adding indexes and rewriting the SQL alone. You'll need to incorporate some other approaches:

  • Pre-aggregating data. You could have some background job that runs and pre-aggregates your data by time intervals. This could be by 1-day intervals, 1-hour, 5-minutes, 1-minute ... whatever makes sense for your application in terms of how much latency you can tolerate.

  • Partitioning your data could help. I was thinking you might get some improvement by hash partitioning test_log over HASH(shorturl) in 32 hash partitions. But you'd need to redefine your PK and then maintaining the lifecycle of the data could become more difficult (not sure whether you're purging data already, but you probably should, or at least archive it).

Bigger picture, it looks like you're using a RDBMS (MySQL database) to do real-time keyword/click analysis. However, I don't believe an RDBMS is ideally suited for this task. Big data and stream processing are really becoming hot on this exact problem. The following search might get you in the right direction: http://www.google.com/search?q=stream+computing+real-time+click+analysis

answered Jun 19, 2014 at 18:41
1

Since the short_url is a keyword, maybe refactoring the query could be a game changer

Start with getting the top 10 keywords whose count> 50

Then, join the 10 words to the other tables

SELECT B.url,B.title,B.keyword,B.domain
FROM
(
 SELECT keyword FROM
 (
 SELECT COUNT(1) keyword_count,shorturl keyword
 FROM test_log AA INNER JOIN test_options BB
 ON AA.click_id = BB.option_value
 WHERE BB.option_name <> 'click_id'
 GROUP BY shorturl HAVING COUNT(1) > 50
 ) KW
 ORDER BY keyword_count,keyword DESC LIMIT 10
) A
LEFT JOIN test_url B USING (keyword);

The worse part is the KW subquery. If KW performs fast, the rest of the query will perform fast

Give it a Try !!! (Hope it works)

answered Jun 19, 2014 at 19:11
3
  • @ypercube I'm not optimistic either, so I took a shot anyway Commented Jun 19, 2014 at 19:46
  • Yes, we have similar approaches. Commented Jun 19, 2014 at 19:49
  • Me three. I had queries like this in the dozen or so variants I tried. Short of schema redesign or precomputation, I don't have much hope for improving this based on query refactoring and additional indexing alone. Commented Jun 20, 2014 at 21:49
1

I'm not very optimistic, but I'd try these two rewrite variations:

SELECT 
 tu.url, 
 tu.title, 
 tu.keyword,
 tu.domain
FROM
 ( SELECT tl.shorturl, COUNT(*) AS cnt
 FROM
 ( SELECT CAST(option_value AS UNSIGNED) AS option_value 
 FROM test_options 
 WHERE option_name = 'click_id'
 ) AS op 
 JOIN
 test_log AS tl
 ON tl.click_id > op.option_value 
 GROUP BY tl.shorturl 
 HAVING COUNT(*) > 50
 ORDER BY cnt DESC 
 LIMIT 10
 ) AS lim
 JOIN 
 test_url as tu 
 ON lim.shorturl = tu.keyword 
ORDER BY 
 lim.cnt DESC ;

Tested at SQL-Fiddle (with much less data of course).

Variation 2 may return slightly different results but it will use the index on test_log (shorturl) instead of a full table scan so it might be more efficient:

SELECT 
 tu.url, 
 tu.title, 
 tu.keyword,
 tu.domain
FROM
 ( SELECT tl.shorturl, 
 COUNT(CASE WHEN tl.click_id > op.option_value THEN 1 END) AS cnt
 FROM
 ( SELECT CAST(option_value AS UNSIGNED) AS option_value 
 FROM test_options 
 WHERE option_name = 'click_id'
 ) AS op 
 CROSS JOIN
 test_log AS tl
 GROUP BY tl.shorturl 
 HAVING cnt >= 0
 ORDER BY cnt DESC 
 LIMIT 10
 ) AS lim
 JOIN 
 test_url as tu 
 ON lim.shorturl = tu.keyword 
ORDER BY 
 lim.cnt DESC ;
answered Jun 19, 2014 at 19:22
1

test_options need not be a subquery in the WHERE clause since it will always return the same single value. The table can be in the FROM clause.

For example:

SELECT
 MAX(tu.url) url,
 MAX(tu.title) title,
 tu.keyword,
 COALESCE
 (
 MAX(tu.domain),
 'example.com'
 ) domain
FROM
 `test_url` tu,
 `test_log` t1,
 `test_options` to
WHERE
 tl.shorturl = tu.keyword
 AND to.option_value <= tl.click_id
GROUP BY
 tu.keyword 
HAVING
 COUNT(tu.keyword) > 50 
ORDER BY
 COUNT(tu.keyword) DESC
LIMIT
 10;
answered Jun 20, 2014 at 15:29

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.