0

I am trying to write a query to generate a report of sorts and I'm stuck on the logic of it. The summary of it is:

I need to select the count of activities, where post_type is 'invoice' AND event = 'new', grouping by user_id that occurred between two dates based on the ts column. However, I need to make sure that when I'm counting, I only count the FIRST instance of each contact_id (Which comes from the invoices table, which is joined on activities.post_id = invoices.id).

In layman's terms: I need to figure out how many new invoices were created by each user_id in a given date-range. But I need to only count the FIRST instance of each contact_id.

So, given the following data:

activities

+----+---------+------------+-----------+-------+---------------------+
| id | user_id | post_id | post_type | event | ts |
+----+---------+------------+-----------+-------+---------------------+
| 1 | 190 | 1000 | invoice | new | 2018年06月19日 14:01:18 |
| 2 | 190 | 1001 | invoice | new | 2018年06月19日 14:01:19 |
| 3 | 190 | 1002 | invoice | new | 2018年06月19日 14:01:20 |
| 4 | 230 | 1003 | invoice | new | 2018年06月19日 14:01:21 |
| 5 | 230 | 1004 | invoice | new | 2018年06月19日 14:01:22 |
| 6 | 230 | 1005 | invoice | new | 2018年06月19日 14:01:23 |
| 7 | 190 | 1006 | invoice | new | 2018年06月20日 14:01:24 |
| 8 | 230 | 1007 | invoice | new | 2018年06月20日 14:01:25 |
| 9 | 190 | 1008 | invoice | new | 2018年06月20日 14:01:26 |
| 10 | 230 | 1009 | invoice | new | 2018年06月20日 14:01:27 |
+----+---------+------------+-----------+-------+---------------------+

invoices

+------+------------+
| id | contact_id |
+------+------------+
| 1000 | 500 |
| 1001 | 600 |
| 1002 | 700 |
| 1003 | 800 |
| 1004 | 500 |
| 1005 | 500 |
| 1006 | 500 |
| 1007 | 600 |
| 1008 | 700 |
| 1009 | 800 |
+------+------------+

If I ran the query to find the count between 2018年06月19日 and 2018年06月20日, I should get the following output:

+---------+-------+
| user_id | count |
+---------+-------+
| 190 | 3 |
| 230 | 1 |
+---------+-------+

user_id 190 should get counted for id: 1, 2, 3 because id 7 and 9 have duplicate contact_ids.

user_id 230 should get counted for id 4 because all of his other rows were counted for user_id 190 already because they were the first instances of this contact_id.

I have an SQL fiddle of the data at https://www.db-fiddle.com/f/9smEW6TwA6UUk7Fh1anAN6/1

asked Dec 14, 2018 at 14:47
1
  • Only the lowest activities.id value for each invoices.contact_id should be counted, if that makes sense. Basically the first invoice created for each contact within this date range is what we're counting. Commented Dec 14, 2018 at 22:04

1 Answer 1

-1

It takes 2 steps:

SELECT user_id, COUNT(DISTINCT contact_id) AS 'count'
 FROM 
 (
 SELECT i.contact_id, MIN(a.user_id) AS user_id
 FROM activities AS a
 JOIN invoices AS i ON i.id = a.post_id
 GROUP BY i.contact_id 
 ) x
 GROUP BY x.user_id;

(Possibly the DISTINCT can be removed.)

answered Dec 14, 2018 at 19:06

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.