I am trying to write a query to generate a report of sorts and I'm stuck on the logic of it. The summary of it is:
I need to select the count of activities
, where post_type
is 'invoice' AND event
= 'new', grouping by user_id
that occurred between two dates based on the ts
column. However, I need to make sure that when I'm counting, I only count the FIRST instance of each contact_id
(Which comes from the invoices
table, which is joined on activities
.post_id
= invoices
.id
).
In layman's terms: I need to figure out how many new invoices were created by each user_id in a given date-range. But I need to only count the FIRST instance of each contact_id.
So, given the following data:
activities
+----+---------+------------+-----------+-------+---------------------+
| id | user_id | post_id | post_type | event | ts |
+----+---------+------------+-----------+-------+---------------------+
| 1 | 190 | 1000 | invoice | new | 2018年06月19日 14:01:18 |
| 2 | 190 | 1001 | invoice | new | 2018年06月19日 14:01:19 |
| 3 | 190 | 1002 | invoice | new | 2018年06月19日 14:01:20 |
| 4 | 230 | 1003 | invoice | new | 2018年06月19日 14:01:21 |
| 5 | 230 | 1004 | invoice | new | 2018年06月19日 14:01:22 |
| 6 | 230 | 1005 | invoice | new | 2018年06月19日 14:01:23 |
| 7 | 190 | 1006 | invoice | new | 2018年06月20日 14:01:24 |
| 8 | 230 | 1007 | invoice | new | 2018年06月20日 14:01:25 |
| 9 | 190 | 1008 | invoice | new | 2018年06月20日 14:01:26 |
| 10 | 230 | 1009 | invoice | new | 2018年06月20日 14:01:27 |
+----+---------+------------+-----------+-------+---------------------+
invoices
+------+------------+
| id | contact_id |
+------+------------+
| 1000 | 500 |
| 1001 | 600 |
| 1002 | 700 |
| 1003 | 800 |
| 1004 | 500 |
| 1005 | 500 |
| 1006 | 500 |
| 1007 | 600 |
| 1008 | 700 |
| 1009 | 800 |
+------+------------+
If I ran the query to find the count between 2018年06月19日 and 2018年06月20日, I should get the following output:
+---------+-------+
| user_id | count |
+---------+-------+
| 190 | 3 |
| 230 | 1 |
+---------+-------+
user_id 190 should get counted for id: 1, 2, 3 because id 7 and 9 have duplicate contact_ids.
user_id 230 should get counted for id 4 because all of his other rows were counted for user_id 190 already because they were the first instances of this contact_id.
I have an SQL fiddle of the data at https://www.db-fiddle.com/f/9smEW6TwA6UUk7Fh1anAN6/1
-
Only the lowest activities.id value for each invoices.contact_id should be counted, if that makes sense. Basically the first invoice created for each contact within this date range is what we're counting.Kevin– Kevin2018年12月14日 22:04:52 +00:00Commented Dec 14, 2018 at 22:04
1 Answer 1
It takes 2 steps:
SELECT user_id, COUNT(DISTINCT contact_id) AS 'count'
FROM
(
SELECT i.contact_id, MIN(a.user_id) AS user_id
FROM activities AS a
JOIN invoices AS i ON i.id = a.post_id
GROUP BY i.contact_id
) x
GROUP BY x.user_id;
(Possibly the DISTINCT
can be removed.)