1

MySQL documentation states that:

Subqueries in the FROM clause cannot be correlated subqueries. They are materialized in whole (evaluated to produce a result set) during query execution, so they cannot be evaluated per row of the outer query.

Okay, fine, I understand why this is. However, what if you want to repeat a specific query on multiple values?

SELECT COUNT( returning_visitors.per_ip ) AS count, AVG( returning_visitors.per_ip ) AS num_visits
FROM (
 SELECT COUNT( * ) AS per_ip
 FROM site_hits_unique
 WHERE site_hits_unique.site_id = ___INPUT___
 AND site_hits_unique.date >= CURDATE( ) - INTERVAL 30 DAY
 GROUP BY site_hits_unique.site_id, site_hits_unique.ip
 HAVING per_ip >1
) AS returning_visitors

Let's say I want to retrieve the result of this query on multiple values in the form of a result table, using the values from SELECT id FROM sites. Is there a way to do this in a single query without using a stored procedure? Something like:

SELECT id, COUNT( returning_visitors.per_ip ) as readers, AVG( returning_visitors.per_ip ) as avg_visits_pr
FROM sites
SUBQUERY-PER-ROW (
 SELECT COUNT( * ) AS per_ip
 FROM site_hits_unique
 WHERE site_hits_unique.site_id = sites.id
 AND site_hits_unique.date >= CURDATE( ) - INTERVAL 30 DAY
 GROUP BY site_hits_unique.site_id, site_hits_unique.ip
 HAVING per_ip > 1
) AS returning_visitors
WHERE sites.id IN (162888, 42705, 11412)

SUBQUERY-PER-ROW here is pseudo-code. JOIN will not work because of the above quoted restriction.

EDIT:

Another attempt at pseudo-code using info from http://dev.mysql.com/doc/refman/5.7/en/row-subqueries.html:

SELECT id, (
 SELECT COUNT( per_ip ) AS count, AVG( per_ip ) AS num_visits
 FROM (
 SELECT COUNT( * ) AS per_ip
 FROM site_hits_unique
 WHERE site_hits_unique.site_id = sites.id
 AND site_hits_unique.date >= CURDATE( ) - INTERVAL 30 DAY
 GROUP BY site_hits_unique.site_id, site_hits_unique.ip
 HAVING per_ip >1
 ) AS returning_visitors
) as (readers, avg_visits_pr)
FROM sites
WHERE sites.id IN (162888, 42705, 11412)

(it doesn't work, this isn't actual valid MySQL syntax, just a pseudo-code example)

asked Sep 3, 2016 at 22:03
0

2 Answers 2

1

So, I discovered a way to accomplish what I wanted to accomplish using stored procedures. I'm hoping this isn't the only solution and that there is a way to do this with a single query, but here's what I've got.

Stored procedure to get returning readers on a single site_id:

DELIMITER $$
CREATE PROCEDURE get_site_readers(
 IN input_site_id INT,
 OUT readers INT,
 OUT avg_visits_pr DOUBLE)
BEGIN
 SELECT COUNT( per_ip ), AVG( per_ip )
 INTO readers, avg_visits_pr
 FROM (
 SELECT COUNT( * ) AS per_ip
 FROM site_hits_unique
 WHERE site_id = input_site_id
 AND date >= CURDATE( ) - INTERVAL 30 DAY
 GROUP BY site_id, ip
 HAVING per_ip > 1
 ) AS returning_visitors;
END

Stored procedure to run get_site_readers(...) on every sites.id:

DELIMITER $$
CREATE PROCEDURE update_site_readers() BEGIN
 DECLARE done BOOLEAN DEFAULT FALSE;
 DECLARE _id BIGINT UNSIGNED;
 DECLARE cur CURSOR FOR SELECT id FROM sites;
 DECLARE CONTINUE HANDLER FOR NOT FOUND SET done := TRUE;
 OPEN cur;
 eachSite: LOOP
 FETCH cur INTO _id;
 IF done THEN
 LEAVE eachSite;
 END IF;
 CALL get_site_readers(_id, @readers, @avg_visits_pr);
 UPDATE sites SET readers = @readers, avg_visits_pr = @avg_visits_pr WHERE id=_id;
 END LOOP eachSite;
 CLOSE cur;
END

Notice the query UPDATE sites SET readers = @readers, avg_visits_pr = @avg_visits_pr WHERE id=_id; - this was my intention from the beginning, to update the sites table with the aggregated data. However, this does not exactly solve the question as defined since originally I was asking about doing a SELECT.

Finally, to call the stored procedure:

CALL update_site_readers();
answered Sep 4, 2016 at 1:18
0

Is this what you are looking for?

SELECT s.id,
 COUNT( rv.per_ip ) as readers,
 AVG( rv.per_ip ) as avg_visits_pr
 FROM sites AS s
 JOIN 
 (
 SELECT u.site_id,
 COUNT( * ) AS per_ip
 FROM site_hits_unique AS u
 WHERE u.date >= CURDATE( ) - INTERVAL 30 DAY
 GROUP BY u.site_id, u.ip
 HAVING per_ip > 1 
 ) AS rv ON rv.site_id = s.id
 WHERE s.id IN (162888, 42705, 11412)

Indexes:

site_hits_unique: INDEX(site_id, date, ip)

Or maybe this:

SELECT site_id,
 COUNT(DISTINCT ip) as readers,
 COUNT(*) / COUNT(DISTINCT ip) as avg_visits_pr
 FROM 
 (
 SELECT site_id, ip, COUNT(*) AS per_ip
 FROM site_hits_unique
 WHERE date >= CURDATE( ) - INTERVAL 30 DAY
 AND site_id IN (162888, 42705, 11412)
 GROUP BY site_id, ip
 HAVING per_ip > 1 
 ) AS rv
 GROUP BY site_id;
answered Sep 4, 2016 at 0:52
2
  • Both technically work, but due to the size of site_hits_unique for performance reasons I'd like to restrict the subquery to only focus on one site_id at a time. The temporary table generated by ANY site_id is humongous. Am I correct in thinking that the subquery dealing with one site_id at a time will consume less system resources and perform faster? Commented Sep 4, 2016 at 0:58
  • I would suggest Uniques Summary Table , but HAVING per_ip > 1 prevents use of it. Commented Sep 5, 2016 at 22:01

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.