I have a query for getting data from a certain table by date range and grouping by week. My CROSS JOIN intends to fill in a default value for each week where there are no results for the date range.
I can then execute this query.
SELECT
SUM(invoice.amount) AS "invoice.amount",
CONCAT(DATE_FORMAT(invoice.updated_at, '%b %d'), ' - ', DATE_FORMAT(DATE_ADD(invoice.updated_at, INTERVAL 7 DAY), '%b %d')) AS "invoice.updated_at"
FROM invoice
CROSS JOIN (
SELECT selected_date
FROM (
SELECT ADDDATE('1970-01-01',t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date
FROM
(SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4
) v
WHERE selected_date BETWEEN '2018-01-01' AND '2018-01-31'
GROUP BY selected_date, YEAR(selected_date), WEEK(selected_date)
) calendar
WHERE invoice.deleted_at IS NULL
AND invoice.updated_at >= '2018-01-01'
AND invoice.updated_at <= '2018-01-31'
AND invoice.status = "PAID"
GROUP BY calendar.selected_date, invoice.id, invoice.amount, YEAR(invoice.updated_at), WEEK(invoice.updated_at)
Assume I have these records in the database:
+----+------------+------------+------------+
| id | amount | status | updated_at |
+----+------------+------------+------------+
| 1 | 1000 | PAID | 2018年01月01日 |
| 2 | 2000 | PAID | 2018年01月01日 |
| 3 | 100 | PAID | 2018年01月07日 |
| 4 | 50 | PAID | 2018年01月11日 |
+----+------------+------------+------------+
I expect to see these results, one record for every week of January:
+--------+-------------------+
| amount | updated_at |
+--------+-------------------+
| 3100 | Jan 1 - Jan 7 |
| 50 | Jan 8 - Jan 15 |
| 0 | Jan 16 - Jan 22 |
| 0 | Jan 23 - Jan 30 |
| 0 | Jan 31 - Jan 31 |
+--------+-------------------+
However, I get like 50 of these random duplicated results, the least of which contains the joined filler weeks since there are no 0 amounts:
+--------+----------------+
| amount | updated_at |
+--------+----------------+
| 1000 | Jan 1 - Jan 7 |
| 2000 | Jan 1 - Jan 7 |
| 100 | Jan 1 - Jan 7 |
| 50 | Jan 8 - Jan 15 |
| 1000 | Jan 1 - Jan 7 |
| 1000 | Jan 1 - Jan 7 |
| 2000 | Jan 1 - Jan 7 |
| 2000 | Jan 1 - Jan 7 |
| 100 | Jan 1 - Jan 7 |
| 50 | Jan 8 - Jan 15 |
| 100 | Jan 1 - Jan 7 |
| 50 | Jan 8 - Jan 15 |
| ... | ... |
| ... | ... |
| ... | ... |
+--------+----------------+
What gives?
Update:
Relevant pieces of the DDL
CREATE TABLE invoice (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
status VARCHAR(255) NOT NULL INDEX,
amount INT(10) UNSIGNED NOT NULL,
--
created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP INDEX,
deleted_at TIMESTAMP NULL DEFAULT NULL,
--
PRIMARY KEY (id)
);
Update 2:
This is my data:
+--------+--------+---------------------+
| amount | status | updated_at |
+--------+--------+---------------------+
| 100 | PAID | 2018年12月23日 23:05:02 |
| 100 | PAID | 2018年12月26日 18:00:04 |
| 150 | PAID | 2018年12月27日 08:43:03 |
| 200 | PAID | 2018年12月29日 08:30:03 |
+--------+--------+---------------------+
This is my full query (no redactions):
SELECT
SUM(invoice.amount) AS "invoice.amount",
CONCAT(DATE_FORMAT(invoice.updated_at, '%b %d'), ' - ', DATE_FORMAT(DATE_ADD(invoice.updated_at, INTERVAL 7 DAY), '%b %d')) AS "invoice.updated_at"
FROM invoice
JOIN account_invoice
ON account_invoice.account_id = 1
AND account_invoice.invoice_id = invoice.id
CROSS JOIN (
SELECT selected_date
FROM (
SELECT ADDDATE('1970-01-01', t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date
FROM
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
) v
WHERE v.selected_date >= "2018-12-05 00:00:00" AND v.selected_date < "2019-01-04 23:59:59"
GROUP BY FLOOR(TO_DAYS(v.selected_date) - 2) / 7
) calendar
WHERE invoice.deleted_at IS NULL
AND invoice.updated_at >= "2018-12-05 00:00:00"
AND invoice.updated_at < "2019-01-04 23:59:59"
AND invoice.status = "PAID"
GROUP BY FLOOR(TO_DAYS(invoice.updated_at) - 2) / 7
My result set:
{{3000} {Dec 23 - Dec 30}}
{{3000} {Dec 26 - Jan 02}}
{{4500} {Dec 27 - Jan 03}}
{{6000} {Dec 29 - Jan 05}}
I expect to see these results instead:
{{0} {Dec 05 - Dec 11}}
{{0} {Dec 12 - Jan 18}}
{{100} {Dec 19 - Dec 25}}
{{350} {Dec 26 - Jan 01}}
{{0} {Jan 02 - Jan 04}}
My result set is not accurate. Those numbers are basically the sum of the numbers of the duplicates I was getting before, crunched down into just 4 rows. Also, the week dates are off; as you can see, they start at the first day of the first record found (Dec 23), and end at Jan 05, 1 day after the date range date.>.<
Update 3:
Running this query now:
SELECT
SUM(invoice.amount) AS "invoice.amount",
CONCAT(DATE_FORMAT(invoice.updated_at, '%b %d'), ' - ', DATE_FORMAT(DATE_ADD(invoice.updated_at, INTERVAL 7 DAY), '%b %d')) AS "invoice.updated_at"
FROM invoice
JOIN account_invoice
ON account_invoice.account_id = 1
AND account_invoice.invoice_id = invoice.id
JOIN (
SELECT selected_date
FROM (
SELECT ADDDATE('1970-01-01', t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date
FROM
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
) v
WHERE v.selected_date >= "2018-12-05 00:00:00" AND v.selected_date < "2019-01-04 23:59:59"
) calendar
ON calendar.selected_date = DATE(invoice.updated_at)
WHERE invoice.deleted_at IS NULL
AND invoice.updated_at >= "2018-12-05 00:00:00"
AND invoice.updated_at < "2019-01-04 23:59:59"
AND invoice.status = "PAID"
GROUP BY FLOOR(TO_DAYS(invoice.updated_at) - 2) / 7
New reuslt set is getting better, looks like this:
{{100} {Dec 23 - Dec 30}}
{{100} {Dec 26 - Jan 02}}
{{150} {Dec 27 - Jan 03}}
{{200} {Dec 29 - Jan 05}}
So the dates are still off, and the SUM function doesn't seem to be working, but now we're getting better numbers at least.
Update 4:
Got my result set to work properly now, however, the start date is offset by 1 day, and the ending is not quite correct cause it overlaps the end date.
Here is my query:
SELECT
IF(invoice.amount IS NULL, 0, SUM(invoice.amount)) AS "invoice.amount",
CONCAT(DATE_FORMAT(calendar.selected_date, '%b %e'), ' - ', DATE_FORMAT(DATE_ADD(calendar.selected_date, INTERVAL 6 DAY), '%b %e')) AS "invoice.updated_at"
FROM (
SELECT selected_date
FROM (
SELECT ADDDATE('1970-01-01', t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date
FROM
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
) v
WHERE v.selected_date >= ? AND v.selected_date < ?
) calendar
LEFT JOIN invoice
ON invoice.deleted_at IS NULL
AND DATE(invoice.updated_at) = calendar.selected_date
AND invoice.status = "PAID"
AND EXISTS (SELECT 1 FROM account_invoice WHERE account_id = 1 AND invoice_id = invoice.id)
GROUP BY YEAR(calendar.selected_date), WEEK(calendar.selected_date)
And here is my result:
{{0} {Dec 6 - Dec 12}}
{{0} {Dec 9 - Dec 15}}
{{0} {Dec 16 - Dec 22}}
{{550} {Dec 23 - Dec 29}}
{{0} {Dec 30 - Jan 5}}
{{0} {Jan 1 - Jan 7}}
Very close...but there are two problems:
- The list of results should start at the first day of the date range, but it starts one day after the first day
- The list of results should end at the last day of the date range, even if it hasn't been fully a week yet at the end. But it still adds an entire week (likely because of my GROUP CONCAT)
This is what I want to see:
{{0} {Dec 5 - Dec 11}}
{{0} {Dec 12 - Dec 18}}
{{0} {Dec 19 - Dec 25}}
{{550} {Dec 26 - Jan 1}}
{{0} {Jan 1 - Jan 4}}
1 Answer 1
First, a bug: When crossing a year boundary, YEAR
+ WEEK
will split into two "weeks". That is easily fixed by revamping the grouping:
FLOOR(TO_DAYS(...)/7)
provides a "week number" that avoids the problem. (You probably need FLOOR(TO_DAYS(...) + n)/7)
, where n
is picked to shift to Sunday or Monday, as needed.)
That expression would be used for the GROUP BY
.
On to the bug... The only thing that should be in the GROUP BY
is the "week", not the rest of the columns; they are causing the duplication.
More...
- Get rid of the inner `GROUP BY.
- Don't do a
CROSS JOIN
; joinON
v.selected_date = DATE(invoice.updated_at)`
(See if that gets closer to what you want. There are several optimizations beyond that.)
- Do the
GROUP BY
based onFLOOR(TO_DAYS(v.selected_date) - 2) / 7
- Consider changing the
GROUP_CONCAT
to useMIN(v.selected_date)
andMAX(v.selected_date)
so that you get the week range, not random updated_at values.
Change
JOIN account_invoice
ON account_invoice.account_id = 1
AND account_invoice.invoice_id = invoice.id
to
AND EXISTS ( SELECT 1 FROM account_invoice
WHERE account_id = 1
AND invoice_id = invoice.id )
-
Thanks for your answer, Rick. Made a 2nd update to my question. I'm getting some really weird results.Lansana– Lansana2019年01月04日 03:29:26 +00:00Commented Jan 4, 2019 at 3:29
-
@Lansana - I added some more.Rick James– Rick James2019年01月04日 03:51:46 +00:00Commented Jan 4, 2019 at 3:51
-
Made a third update in my question. We're getting closer...Lansana– Lansana2019年01月04日 04:04:58 +00:00Commented Jan 4, 2019 at 4:04
-
-
Hmm that doesn't seem to change much. If I move the group by into the subquery, I get a single record,
{{550} {Dec 23 - Dec 29}}
. That is technically accurate, however there are missing weeks, so that's not right.Lansana– Lansana2019年01月04日 04:43:10 +00:00Commented Jan 4, 2019 at 4:43
seq_0_to_99999
would replace that inner CrossJoin.)