MySQL Correlated Subquery (with multiple columns) per row

Question 1

MySQL documentation states that:

Subqueries in the FROM clause cannot be correlated subqueries. They are materialized in whole (evaluated to produce a result set) during query execution, so they cannot be evaluated per row of the outer query.

Okay, fine, I understand why this is. However, what if you want to repeat a specific query on multiple values?

SELECT COUNT( returning_visitors.per_ip ) AS count, AVG( returning_visitors.per_ip ) AS num_visits
FROM (
 SELECT COUNT( * ) AS per_ip
 FROM site_hits_unique
 WHERE site_hits_unique.site_id = ___INPUT___
 AND site_hits_unique.date >= CURDATE( ) - INTERVAL 30 DAY
 GROUP BY site_hits_unique.site_id, site_hits_unique.ip
 HAVING per_ip >1
) AS returning_visitors

Let's say I want to retrieve the result of this query on multiple values in the form of a result table, using the values from SELECT id FROM sites. Is there a way to do this in a single query without using a stored procedure? Something like:

SELECT id, COUNT( returning_visitors.per_ip ) as readers, AVG( returning_visitors.per_ip ) as avg_visits_pr
FROM sites
SUBQUERY-PER-ROW (
 SELECT COUNT( * ) AS per_ip
 FROM site_hits_unique
 WHERE site_hits_unique.site_id = sites.id
 AND site_hits_unique.date >= CURDATE( ) - INTERVAL 30 DAY
 GROUP BY site_hits_unique.site_id, site_hits_unique.ip
 HAVING per_ip > 1
) AS returning_visitors
WHERE sites.id IN (162888, 42705, 11412)

SUBQUERY-PER-ROW here is pseudo-code. JOIN will not work because of the above quoted restriction.

EDIT:

Another attempt at pseudo-code using info from http://dev.mysql.com/doc/refman/5.7/en/row-subqueries.html:

SELECT id, (
 SELECT COUNT( per_ip ) AS count, AVG( per_ip ) AS num_visits
 FROM (
 SELECT COUNT( * ) AS per_ip
 FROM site_hits_unique
 WHERE site_hits_unique.site_id = sites.id
 AND site_hits_unique.date >= CURDATE( ) - INTERVAL 30 DAY
 GROUP BY site_hits_unique.site_id, site_hits_unique.ip
 HAVING per_ip >1
 ) AS returning_visitors
) as (readers, avg_visits_pr)
FROM sites
WHERE sites.id IN (162888, 42705, 11412)

(it doesn't work, this isn't actual valid MySQL syntax, just a pseudo-code example)

Question 2

So, I discovered a way to accomplish what I wanted to accomplish using stored procedures. I'm hoping this isn't the only solution and that there is a way to do this with a single query, but here's what I've got.

Stored procedure to get returning readers on a single site_id:

DELIMITER $$
CREATE PROCEDURE get_site_readers(
 IN input_site_id INT,
 OUT readers INT,
 OUT avg_visits_pr DOUBLE)
BEGIN
 SELECT COUNT( per_ip ), AVG( per_ip )
 INTO readers, avg_visits_pr
 FROM (
 SELECT COUNT( * ) AS per_ip
 FROM site_hits_unique
 WHERE site_id = input_site_id
 AND date >= CURDATE( ) - INTERVAL 30 DAY
 GROUP BY site_id, ip
 HAVING per_ip > 1
 ) AS returning_visitors;
END

Stored procedure to run get_site_readers(...) on every sites.id:

DELIMITER $$
CREATE PROCEDURE update_site_readers() BEGIN
 DECLARE done BOOLEAN DEFAULT FALSE;
 DECLARE _id BIGINT UNSIGNED;
 DECLARE cur CURSOR FOR SELECT id FROM sites;
 DECLARE CONTINUE HANDLER FOR NOT FOUND SET done := TRUE;
 OPEN cur;
 eachSite: LOOP
 FETCH cur INTO _id;
 IF done THEN
 LEAVE eachSite;
 END IF;
 CALL get_site_readers(_id, @readers, @avg_visits_pr);
 UPDATE sites SET readers = @readers, avg_visits_pr = @avg_visits_pr WHERE id=_id;
 END LOOP eachSite;
 CLOSE cur;
END

Notice the query UPDATE sites SET readers = @readers, avg_visits_pr = @avg_visits_pr WHERE id=_id; - this was my intention from the beginning, to update the sites table with the aggregated data. However, this does not exactly solve the question as defined since originally I was asking about doing a SELECT.

Finally, to call the stored procedure:

CALL update_site_readers();

Question 3

Is this what you are looking for?

SELECT s.id,
 COUNT( rv.per_ip ) as readers,
 AVG( rv.per_ip ) as avg_visits_pr
 FROM sites AS s
 JOIN 
 (
 SELECT u.site_id,
 COUNT( * ) AS per_ip
 FROM site_hits_unique AS u
 WHERE u.date >= CURDATE( ) - INTERVAL 30 DAY
 GROUP BY u.site_id, u.ip
 HAVING per_ip > 1 
 ) AS rv ON rv.site_id = s.id
 WHERE s.id IN (162888, 42705, 11412)

Indexes:

site_hits_unique: INDEX(site_id, date, ip)

Or maybe this:

SELECT site_id,
 COUNT(DISTINCT ip) as readers,
 COUNT(*) / COUNT(DISTINCT ip) as avg_visits_pr
 FROM 
 (
 SELECT site_id, ip, COUNT(*) AS per_ip
 FROM site_hits_unique
 WHERE date >= CURDATE( ) - INTERVAL 30 DAY
 AND site_id IN (162888, 42705, 11412)
 GROUP BY site_id, ip
 HAVING per_ip > 1 
 ) AS rv
 GROUP BY site_id;

Question 4

Both technically work, but due to the size of site_hits_unique for performance reasons I'd like to restrict the subquery to only focus on one site_id at a time. The temporary table generated by ANY site_id is humongous. Am I correct in thinking that the subquery dealing with one site_id at a time will consume less system resources and perform faster?

Question 5

I would suggest Uniques Summary Table , but HAVING per_ip > 1 prevents use of it.

Dan Dan 2352 silver badges11 bronze badges · Answer 1 · 2016-09-04 01:18:20Z

So, I discovered a way to accomplish what I wanted to accomplish using stored procedures. I'm hoping this isn't the only solution and that there is a way to do this with a single query, but here's what I've got.

Stored procedure to get returning readers on a single site_id:

DELIMITER $$
CREATE PROCEDURE get_site_readers(
 IN input_site_id INT,
 OUT readers INT,
 OUT avg_visits_pr DOUBLE)
BEGIN
 SELECT COUNT( per_ip ), AVG( per_ip )
 INTO readers, avg_visits_pr
 FROM (
 SELECT COUNT( * ) AS per_ip
 FROM site_hits_unique
 WHERE site_id = input_site_id
 AND date >= CURDATE( ) - INTERVAL 30 DAY
 GROUP BY site_id, ip
 HAVING per_ip > 1
 ) AS returning_visitors;
END

Stored procedure to run get_site_readers(...) on every sites.id:

DELIMITER $$
CREATE PROCEDURE update_site_readers() BEGIN
 DECLARE done BOOLEAN DEFAULT FALSE;
 DECLARE _id BIGINT UNSIGNED;
 DECLARE cur CURSOR FOR SELECT id FROM sites;
 DECLARE CONTINUE HANDLER FOR NOT FOUND SET done := TRUE;
 OPEN cur;
 eachSite: LOOP
 FETCH cur INTO _id;
 IF done THEN
 LEAVE eachSite;
 END IF;
 CALL get_site_readers(_id, @readers, @avg_visits_pr);
 UPDATE sites SET readers = @readers, avg_visits_pr = @avg_visits_pr WHERE id=_id;
 END LOOP eachSite;
 CLOSE cur;
END

Notice the query UPDATE sites SET readers = @readers, avg_visits_pr = @avg_visits_pr WHERE id=_id; - this was my intention from the beginning, to update the sites table with the aggregated data. However, this does not exactly solve the question as defined since originally I was asking about doing a SELECT.

Finally, to call the stored procedure:

CALL update_site_readers();

Rick James Rick James 80.7k5 gold badges52 silver badges119 bronze badges · Answer 2 · 2016-09-04 00:52:16Z

Is this what you are looking for?

SELECT s.id,
 COUNT( rv.per_ip ) as readers,
 AVG( rv.per_ip ) as avg_visits_pr
 FROM sites AS s
 JOIN 
 (
 SELECT u.site_id,
 COUNT( * ) AS per_ip
 FROM site_hits_unique AS u
 WHERE u.date >= CURDATE( ) - INTERVAL 30 DAY
 GROUP BY u.site_id, u.ip
 HAVING per_ip > 1 
 ) AS rv ON rv.site_id = s.id
 WHERE s.id IN (162888, 42705, 11412)

Indexes:

site_hits_unique: INDEX(site_id, date, ip)

Or maybe this:

SELECT site_id,
 COUNT(DISTINCT ip) as readers,
 COUNT(*) / COUNT(DISTINCT ip) as avg_visits_pr
 FROM 
 (
 SELECT site_id, ip, COUNT(*) AS per_ip
 FROM site_hits_unique
 WHERE date >= CURDATE( ) - INTERVAL 30 DAY
 AND site_id IN (162888, 42705, 11412)
 GROUP BY site_id, ip
 HAVING per_ip > 1 
 ) AS rv
 GROUP BY site_id;

Both technically work, but due to the size of site_hits_unique for performance reasons I'd like to restrict the subquery to only focus on one site_id at a time. The temporary table generated by ANY site_id is humongous. Am I correct in thinking that the subquery dealing with one site_id at a time will consume less system resources and perform faster?
I would suggest Uniques Summary Table , but HAVING per_ip > 1 prevents use of it.

Stack Exchange Network

MySQL Correlated Subquery (with multiple columns) per row

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

MySQL Correlated Subquery (with multiple columns) per row

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions