MySQL documentation states that:
Subqueries in the FROM clause cannot be correlated subqueries. They are materialized in whole (evaluated to produce a result set) during query execution, so they cannot be evaluated per row of the outer query.
Okay, fine, I understand why this is. However, what if you want to repeat a specific query on multiple values?
SELECT COUNT( returning_visitors.per_ip ) AS count, AVG( returning_visitors.per_ip ) AS num_visits
FROM (
SELECT COUNT( * ) AS per_ip
FROM site_hits_unique
WHERE site_hits_unique.site_id = ___INPUT___
AND site_hits_unique.date >= CURDATE( ) - INTERVAL 30 DAY
GROUP BY site_hits_unique.site_id, site_hits_unique.ip
HAVING per_ip >1
) AS returning_visitors
Let's say I want to retrieve the result of this query on multiple values in the form of a result table, using the values from SELECT id FROM sites
. Is there a way to do this in a single query without using a stored procedure? Something like:
SELECT id, COUNT( returning_visitors.per_ip ) as readers, AVG( returning_visitors.per_ip ) as avg_visits_pr
FROM sites
SUBQUERY-PER-ROW (
SELECT COUNT( * ) AS per_ip
FROM site_hits_unique
WHERE site_hits_unique.site_id = sites.id
AND site_hits_unique.date >= CURDATE( ) - INTERVAL 30 DAY
GROUP BY site_hits_unique.site_id, site_hits_unique.ip
HAVING per_ip > 1
) AS returning_visitors
WHERE sites.id IN (162888, 42705, 11412)
SUBQUERY-PER-ROW
here is pseudo-code. JOIN
will not work because of the above quoted restriction.
EDIT:
Another attempt at pseudo-code using info from http://dev.mysql.com/doc/refman/5.7/en/row-subqueries.html:
SELECT id, (
SELECT COUNT( per_ip ) AS count, AVG( per_ip ) AS num_visits
FROM (
SELECT COUNT( * ) AS per_ip
FROM site_hits_unique
WHERE site_hits_unique.site_id = sites.id
AND site_hits_unique.date >= CURDATE( ) - INTERVAL 30 DAY
GROUP BY site_hits_unique.site_id, site_hits_unique.ip
HAVING per_ip >1
) AS returning_visitors
) as (readers, avg_visits_pr)
FROM sites
WHERE sites.id IN (162888, 42705, 11412)
(it doesn't work, this isn't actual valid MySQL syntax, just a pseudo-code example)
2 Answers 2
So, I discovered a way to accomplish what I wanted to accomplish using stored procedures. I'm hoping this isn't the only solution and that there is a way to do this with a single query, but here's what I've got.
Stored procedure to get returning readers on a single site_id:
DELIMITER $$
CREATE PROCEDURE get_site_readers(
IN input_site_id INT,
OUT readers INT,
OUT avg_visits_pr DOUBLE)
BEGIN
SELECT COUNT( per_ip ), AVG( per_ip )
INTO readers, avg_visits_pr
FROM (
SELECT COUNT( * ) AS per_ip
FROM site_hits_unique
WHERE site_id = input_site_id
AND date >= CURDATE( ) - INTERVAL 30 DAY
GROUP BY site_id, ip
HAVING per_ip > 1
) AS returning_visitors;
END
Stored procedure to run get_site_readers(...) on every sites.id:
DELIMITER $$
CREATE PROCEDURE update_site_readers() BEGIN
DECLARE done BOOLEAN DEFAULT FALSE;
DECLARE _id BIGINT UNSIGNED;
DECLARE cur CURSOR FOR SELECT id FROM sites;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done := TRUE;
OPEN cur;
eachSite: LOOP
FETCH cur INTO _id;
IF done THEN
LEAVE eachSite;
END IF;
CALL get_site_readers(_id, @readers, @avg_visits_pr);
UPDATE sites SET readers = @readers, avg_visits_pr = @avg_visits_pr WHERE id=_id;
END LOOP eachSite;
CLOSE cur;
END
Notice the query UPDATE sites SET readers = @readers, avg_visits_pr = @avg_visits_pr WHERE id=_id;
- this was my intention from the beginning, to update the sites table with the aggregated data. However, this does not exactly solve the question as defined since originally I was asking about doing a SELECT
.
Finally, to call the stored procedure:
CALL update_site_readers();
Is this what you are looking for?
SELECT s.id,
COUNT( rv.per_ip ) as readers,
AVG( rv.per_ip ) as avg_visits_pr
FROM sites AS s
JOIN
(
SELECT u.site_id,
COUNT( * ) AS per_ip
FROM site_hits_unique AS u
WHERE u.date >= CURDATE( ) - INTERVAL 30 DAY
GROUP BY u.site_id, u.ip
HAVING per_ip > 1
) AS rv ON rv.site_id = s.id
WHERE s.id IN (162888, 42705, 11412)
Indexes:
site_hits_unique: INDEX(site_id, date, ip)
Or maybe this:
SELECT site_id,
COUNT(DISTINCT ip) as readers,
COUNT(*) / COUNT(DISTINCT ip) as avg_visits_pr
FROM
(
SELECT site_id, ip, COUNT(*) AS per_ip
FROM site_hits_unique
WHERE date >= CURDATE( ) - INTERVAL 30 DAY
AND site_id IN (162888, 42705, 11412)
GROUP BY site_id, ip
HAVING per_ip > 1
) AS rv
GROUP BY site_id;
-
Both technically work, but due to the size of site_hits_unique for performance reasons I'd like to restrict the subquery to only focus on one site_id at a time. The temporary table generated by ANY site_id is humongous. Am I correct in thinking that the subquery dealing with one site_id at a time will consume less system resources and perform faster?Dan– Dan2016年09月04日 00:58:52 +00:00Commented Sep 4, 2016 at 0:58
-
I would suggest Uniques Summary Table , but
HAVING per_ip > 1
prevents use of it.Rick James– Rick James2016年09月05日 22:01:43 +00:00Commented Sep 5, 2016 at 22:01