CROSS JOIN of list of dates and another table

Question 1

I have a query for getting data from a certain table by date range and grouping by week. My CROSS JOIN intends to fill in a default value for each week where there are no results for the date range.

I can then execute this query.

 SELECT 
 SUM(invoice.amount) AS "invoice.amount", 
 CONCAT(DATE_FORMAT(invoice.updated_at, '%b %d'), ' - ', DATE_FORMAT(DATE_ADD(invoice.updated_at, INTERVAL 7 DAY), '%b %d')) AS "invoice.updated_at"
 FROM invoice
 CROSS JOIN (
 SELECT selected_date 
 FROM (
 SELECT ADDDATE('1970-01-01',t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date 
 FROM
 (SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
 (SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
 (SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
 (SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
 (SELECT 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4
 ) v
 WHERE selected_date BETWEEN '2018-01-01' AND '2018-01-31'
 GROUP BY selected_date, YEAR(selected_date), WEEK(selected_date)
 ) calendar
 WHERE invoice.deleted_at IS NULL
 AND invoice.updated_at >= '2018-01-01'
 AND invoice.updated_at <= '2018-01-31'
 AND invoice.status = "PAID"
 GROUP BY calendar.selected_date, invoice.id, invoice.amount, YEAR(invoice.updated_at), WEEK(invoice.updated_at)

Assume I have these records in the database:

+----+------------+------------+------------+
| id | amount | status | updated_at | 
+----+------------+------------+------------+
| 1 | 1000 | PAID | 2018年01月01日 |
| 2 | 2000 | PAID | 2018年01月01日 |
| 3 | 100 | PAID | 2018年01月07日 |
| 4 | 50 | PAID | 2018年01月11日 |
+----+------------+------------+------------+

I expect to see these results, one record for every week of January:

+--------+-------------------+
| amount | updated_at | 
+--------+-------------------+
| 3100 | Jan 1 - Jan 7 |
| 50 | Jan 8 - Jan 15 |
| 0 | Jan 16 - Jan 22 |
| 0 | Jan 23 - Jan 30 |
| 0 | Jan 31 - Jan 31 |
+--------+-------------------+

However, I get like 50 of these random duplicated results, the least of which contains the joined filler weeks since there are no 0 amounts:

+--------+----------------+
| amount | updated_at | 
+--------+----------------+
| 1000 | Jan 1 - Jan 7 |
| 2000 | Jan 1 - Jan 7 |
| 100 | Jan 1 - Jan 7 |
| 50 | Jan 8 - Jan 15 |
| 1000 | Jan 1 - Jan 7 |
| 1000 | Jan 1 - Jan 7 |
| 2000 | Jan 1 - Jan 7 |
| 2000 | Jan 1 - Jan 7 |
| 100 | Jan 1 - Jan 7 |
| 50 | Jan 8 - Jan 15 |
| 100 | Jan 1 - Jan 7 |
| 50 | Jan 8 - Jan 15 |
| ... | ... |
| ... | ... |
| ... | ... |
+--------+----------------+

What gives?

Update:

Relevant pieces of the DDL

CREATE TABLE invoice (
 id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
 status VARCHAR(255) NOT NULL INDEX,
 amount INT(10) UNSIGNED NOT NULL,
 --
 created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
 updated_at TIMESTAMP NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP INDEX,
 deleted_at TIMESTAMP NULL DEFAULT NULL,
 --
 PRIMARY KEY (id)
);

Update 2:

This is my data:

+--------+--------+---------------------+
| amount | status | updated_at |
+--------+--------+---------------------+
| 100 | PAID | 2018年12月23日 23:05:02 |
| 100 | PAID | 2018年12月26日 18:00:04 |
| 150 | PAID | 2018年12月27日 08:43:03 |
| 200 | PAID | 2018年12月29日 08:30:03 |
+--------+--------+---------------------+

This is my full query (no redactions):

 SELECT 
 SUM(invoice.amount) AS "invoice.amount", 
 CONCAT(DATE_FORMAT(invoice.updated_at, '%b %d'), ' - ', DATE_FORMAT(DATE_ADD(invoice.updated_at, INTERVAL 7 DAY), '%b %d')) AS "invoice.updated_at"
 FROM invoice
 JOIN account_invoice
 ON account_invoice.account_id = 1
 AND account_invoice.invoice_id = invoice.id
 CROSS JOIN (
 SELECT selected_date 
 FROM (
 SELECT ADDDATE('1970-01-01', t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date 
 FROM
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
 ) v
 WHERE v.selected_date >= "2018-12-05 00:00:00" AND v.selected_date < "2019-01-04 23:59:59"
 GROUP BY FLOOR(TO_DAYS(v.selected_date) - 2) / 7
 ) calendar
 WHERE invoice.deleted_at IS NULL
 AND invoice.updated_at >= "2018-12-05 00:00:00"
 AND invoice.updated_at < "2019-01-04 23:59:59"
 AND invoice.status = "PAID"
 GROUP BY FLOOR(TO_DAYS(invoice.updated_at) - 2) / 7

My result set:

{{3000} {Dec 23 - Dec 30}}
{{3000} {Dec 26 - Jan 02}}
{{4500} {Dec 27 - Jan 03}}
{{6000} {Dec 29 - Jan 05}}

I expect to see these results instead:

{{0} {Dec 05 - Dec 11}}
{{0} {Dec 12 - Jan 18}}
{{100} {Dec 19 - Dec 25}}
{{350} {Dec 26 - Jan 01}}
{{0} {Jan 02 - Jan 04}}

My result set is not accurate. Those numbers are basically the sum of the numbers of the duplicates I was getting before, crunched down into just 4 rows. Also, the week dates are off; as you can see, they start at the first day of the first record found (Dec 23), and end at Jan 05, 1 day after the date range date.>.<

Update 3:

Running this query now:

 SELECT 
 SUM(invoice.amount) AS "invoice.amount", 
 CONCAT(DATE_FORMAT(invoice.updated_at, '%b %d'), ' - ', DATE_FORMAT(DATE_ADD(invoice.updated_at, INTERVAL 7 DAY), '%b %d')) AS "invoice.updated_at"
 FROM invoice
 JOIN account_invoice
 ON account_invoice.account_id = 1
 AND account_invoice.invoice_id = invoice.id
 JOIN (
 SELECT selected_date 
 FROM (
 SELECT ADDDATE('1970-01-01', t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date 
 FROM
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
 ) v
 WHERE v.selected_date >= "2018-12-05 00:00:00" AND v.selected_date < "2019-01-04 23:59:59"
 ) calendar
 ON calendar.selected_date = DATE(invoice.updated_at)
 WHERE invoice.deleted_at IS NULL
 AND invoice.updated_at >= "2018-12-05 00:00:00"
 AND invoice.updated_at < "2019-01-04 23:59:59"
 AND invoice.status = "PAID"
 GROUP BY FLOOR(TO_DAYS(invoice.updated_at) - 2) / 7

New reuslt set is getting better, looks like this:

{{100} {Dec 23 - Dec 30}}
{{100} {Dec 26 - Jan 02}}
{{150} {Dec 27 - Jan 03}}
{{200} {Dec 29 - Jan 05}}

So the dates are still off, and the SUM function doesn't seem to be working, but now we're getting better numbers at least.

Update 4:

Got my result set to work properly now, however, the start date is offset by 1 day, and the ending is not quite correct cause it overlaps the end date.

Here is my query:

 SELECT 
 IF(invoice.amount IS NULL, 0, SUM(invoice.amount)) AS "invoice.amount", 
 CONCAT(DATE_FORMAT(calendar.selected_date, '%b %e'), ' - ', DATE_FORMAT(DATE_ADD(calendar.selected_date, INTERVAL 6 DAY), '%b %e')) AS "invoice.updated_at"
 FROM (
 SELECT selected_date 
 FROM (
 SELECT ADDDATE('1970-01-01', t4.i*10000 + t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date 
 FROM
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
 (SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
 ) v
 WHERE v.selected_date >= ? AND v.selected_date < ?
 ) calendar
 LEFT JOIN invoice
 ON invoice.deleted_at IS NULL
 AND DATE(invoice.updated_at) = calendar.selected_date
 AND invoice.status = "PAID"
 AND EXISTS (SELECT 1 FROM account_invoice WHERE account_id = 1 AND invoice_id = invoice.id)
 GROUP BY YEAR(calendar.selected_date), WEEK(calendar.selected_date)

And here is my result:

{{0} {Dec 6 - Dec 12}}
{{0} {Dec 9 - Dec 15}}
{{0} {Dec 16 - Dec 22}}
{{550} {Dec 23 - Dec 29}}
{{0} {Dec 30 - Jan 5}}
{{0} {Jan 1 - Jan 7}}

Very close...but there are two problems:

The list of results should start at the first day of the date range, but it starts one day after the first day
The list of results should end at the last day of the date range, even if it hasn't been fully a week yet at the end. But it still adds an entire week (likely because of my GROUP CONCAT)

This is what I want to see:

{{0} {Dec 5 - Dec 11}}
{{0} {Dec 12 - Dec 18}}
{{0} {Dec 19 - Dec 25}}
{{550} {Dec 26 - Jan 1}}
{{0} {Jan 1 - Jan 4}}

Question 2

Please, include the DDL & Data insert scripts

Question 3

(One place where MariaDB shines: seq_0_to_99999 would replace that inner CrossJoin.)

Question 4

First, a bug: When crossing a year boundary, YEAR + WEEK will split into two "weeks". That is easily fixed by revamping the grouping:

FLOOR(TO_DAYS(...)/7)

provides a "week number" that avoids the problem. (You probably need FLOOR(TO_DAYS(...) + n)/7), where n is picked to shift to Sunday or Monday, as needed.)

That expression would be used for the GROUP BY.

On to the bug... The only thing that should be in the GROUP BY is the "week", not the rest of the columns; they are causing the duplication.

More...

Get rid of the inner `GROUP BY.
Don't do a CROSS JOIN; join ONv.selected_date = DATE(invoice.updated_at)`

(See if that gets closer to what you want. There are several optimizations beyond that.)

Do the GROUP BY based on FLOOR(TO_DAYS(v.selected_date) - 2) / 7
Consider changing the GROUP_CONCAT to use MIN(v.selected_date) and MAX(v.selected_date) so that you get the week range, not random updated_at values.

Change

JOIN account_invoice
 ON account_invoice.account_id = 1
 AND account_invoice.invoice_id = invoice.id

to

AND EXISTS ( SELECT 1 FROM account_invoice
 WHERE account_id = 1
 AND invoice_id = invoice.id )

Question 5

Thanks for your answer, Rick. Made a 2nd update to my question. I'm getting some really weird results.

Question 6

@Lansana - I added some more.

Question 7

Made a third update in my question. We're getting closer...

Question 8

@Lansana - more

Question 9

Hmm that doesn't seem to change much. If I move the group by into the subquery, I get a single record, {{550} {Dec 23 - Dec 29}}. That is technically accurate, however there are missing weeks, so that's not right.

Rick James Rick James 80.7k5 gold badges52 silver badges119 bronze badges · Accepted Answer · 2019-01-03 16:48:56Z

First, a bug: When crossing a year boundary, YEAR + WEEK will split into two "weeks". That is easily fixed by revamping the grouping:

FLOOR(TO_DAYS(...)/7)

provides a "week number" that avoids the problem. (You probably need FLOOR(TO_DAYS(...) + n)/7), where n is picked to shift to Sunday or Monday, as needed.)

That expression would be used for the GROUP BY.

On to the bug... The only thing that should be in the GROUP BY is the "week", not the rest of the columns; they are causing the duplication.

More...

Get rid of the inner `GROUP BY.
Don't do a CROSS JOIN; join ONv.selected_date = DATE(invoice.updated_at)`

(See if that gets closer to what you want. There are several optimizations beyond that.)

Do the GROUP BY based on FLOOR(TO_DAYS(v.selected_date) - 2) / 7
Consider changing the GROUP_CONCAT to use MIN(v.selected_date) and MAX(v.selected_date) so that you get the week range, not random updated_at values.

Change

JOIN account_invoice
 ON account_invoice.account_id = 1
 AND account_invoice.invoice_id = invoice.id

to

AND EXISTS ( SELECT 1 FROM account_invoice
 WHERE account_id = 1
 AND invoice_id = invoice.id )

Thanks for your answer, Rick. Made a 2nd update to my question. I'm getting some really weird results.
Hmm that doesn't seem to change much. If I move the group by into the subquery, I get a single record, {{550} {Dec 23 - Dec 29}}. That is technically accurate, however there are missing weeks, so that's not right.

Stack Exchange Network

CROSS JOIN of list of dates and another table

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CROSS JOIN of list of dates and another table

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions