1

I need to generate data to build some charts.
The current results have missing data points and I'd like them to fill them with 0's.

Data is stored in MySQL 8. Simplified data sample & query fiddle here.

The query I currently have is

SELECT
 currency,
 GROUP_CONCAT(volume) AS volume
FROM (
 SELECT
 DATE(t.created_at) AS created_at,
 t.currency AS currency,
 SUM(t.amount) AS volume
 FROM
 transactions AS t
 WHERE (t.created_at BETWEEN @start AND @end)
 GROUP BY
 created_at,
 currency
 ORDER BY
 created_at,
 currency) r
GROUP BY
 currency

which creates this result set:

currency volume
AUD 27553.52,13395.20,18349.51,3773.29,...
BRL 272.45,...
CAD 14738.08,7372.58,5926.08,7877.14,...
CHF 320.00,27.00,47.00,27.00,...
EUR 888.62,2806.27,4445.30,805.93,...
GBP 48588.64,37266.79,27275.01,13981.08,...
MXN 10.00,16298.00,1900.00,...
SEK 497.00,497.00,1491.00,...
USD 374660.85,347793.84,523608.81,839710.22,...

Where I need help:

  • How can I fill the mising data points with 0?
  • Let's assume the worst and at some point there are no transactions for any of the currencies for a day (or multiple days). How can I fill those missing data points?

I've read quite a few posts about WITH RECURSIVE and calendar tables but I can't wrap my head around it.

I'd appreciate any help/pointers. Thank you!

Update 1

@Akina's answer basically does what I asked for (thank you!) but: the query takes ages to complete.

The transactions table currently holds ~4m rows of data. A monthly resultset averages to roughly ~270k rows. Amogst others, there are indexes on currency, created_at and a compound index for created_at,currency.

Update 2

Something is off with my indexes. If I
LEFT JOIN transactions AS t FORCE INDEX(created_at) ...
then the query completes in ~15s, regardless if I set the date range to a month or 6 months.

asked Jul 30, 2021 at 4:42
1
  • ORDER BY in subquery/CTE without LIMIT makes no sense - it may be ignored in outer query. Commented Jul 30, 2021 at 7:10

3 Answers 3

1
WITH RECURSIVE -- calendar generation needs in recuirsion
-- generate calendar, DATE() performs parameter checking additionally
calendar AS ( SELECT DATE(@start) created_at
 UNION ALL
 SELECT created_at + INTERVAL 1 DAY
 FROM calendar
 WHERE created_at < DATE(@end) ),
-- collect currencies list
 
currencies AS ( SELECT DISTINCT currency
 FROM transactions ),
-- gather daily data for all dates and all currencies
-- replace NULLs for the dates where there is no data with zeroz
daily AS ( SELECT ca.created_at AS created_at, 
 cu.currency AS currency, 
 COALESCE(SUM(t.amount), 0) AS volume
 FROM calendar ca
 CROSS JOIN currencies cu
 LEFT JOIN transactions AS t ON ca.created_at = DATE(t.created_at)
 AND cu.currency = t.currency
 GROUP BY ca.created_at, cu.currency )
-- get final data, aggregated data is sorted
SELECT currency, 
 GROUP_CONCAT(volume ORDER BY created_at) AS volume
FROM daily
GROUP BY currency;

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=96b9434f665ea819c1fc2be225c403d4

answered Jul 30, 2021 at 7:09
8
  • Thank you. Unfortunately this results in the same data set. Additionally the query now takes 10+ seconds instead of ~1. Commented Jul 30, 2021 at 7:28
  • @MadSputnik Fixed. Check. Additionally - do not forget about the setting which limits GROUP_CONCAT() output length... Commented Jul 30, 2021 at 8:21
  • The query does give the correct results BUT: querying a single day takes ~50s (~7,500 rows), querying a month takes ~970s (~175,000 rows). Can this be improved? (my query above takes ~70ms for a day, ~2.5s for a month) Commented Jul 30, 2021 at 9:04
  • 1
    @MadSputnik Can this be improved? Of course. 1) Create currencies list as separate table and use it instead of CTE. 2) Generate dates list as separate (temporary) table and use it instead of CTE. 3) Add generated column for created_at_date DATE AS (DATE(t.created_at)) VIRTUAL and use it for joining (index by created_at_date, currency, amount). Commented Jul 30, 2021 at 9:07
  • Still takes way too long. :( Commented Jul 30, 2021 at 9:35
0

Calendar table is just table holding all desired values, for your exaple it would look like:

CREATE TABLE calendar (Currency char(3));
INSERT INTO calendar (Currency) VALUES ('AUD');
INSERT INTO calendar (Currency) VALUES ('BRL');
INSERT INTO calendar (Currency) VALUES ('CAD');
INSERT INTO calendar (Currency) VALUES ('CHF');
INSERT INTO calendar (Currency) VALUES ('EUR');
INSERT INTO calendar (Currency) VALUES ('GBP');
INSERT INTO calendar (Currency) VALUES ('MXN');
INSERT INTO calendar (Currency) VALUES ('SEK');
INSERT INTO calendar (Currency) VALUES ('USD');

Then it's just the matter of correct joining those values, so your updated query would look like:


-- Query
SET @start = '2021-07-01 00:00:00';
SET @end = '2021-08-01 00:00:00';
SELECT
 c.Currency,
 COALESCE(GROUP_CONCAT(volume), 0) AS volume
FROM (
 SELECT
 DATE(t.created_at) AS created_at,
 t.currency AS currency,
 SUM(t.amount) AS volume
 FROM
 transactions AS t
 WHERE (t.created_at BETWEEN @start AND @end)
 GROUP BY
 created_at,
 currency
 ORDER BY
 created_at,
 currency) r
RIGHT JOIN calendar c on r.currency = c.Currency
GROUP BY
 c.Currency;

NOTE: you might want to delete calendar table after all or just use temporary table :)

Here's fiddle

answered Jul 30, 2021 at 8:25
5
  • See sample data - none row contains NULL. Commented Jul 30, 2021 at 8:33
  • @Akina Showed data is RESULT , not data OP is talking about. Commented Jul 30, 2021 at 8:53
  • Please look 3rd line of the question text ("Data is stored in MySQL 8. Simplified data sample & query fiddle here.") and follow by the link. Commented Jul 30, 2021 at 8:55
  • @Akina Oh, thank you so much for that ! Rewritten my answer. :) Commented Jul 30, 2021 at 9:36
  • Thanks. Your answer doesn't work though. Commented Jul 30, 2021 at 11:21
0

To get "missing" date you have to generate a date table with all dates and then join it to the original tabale.

There are many possibilities to do that, i choose, one that works also in mysql 5.x

When you have more data, there will be less 0.00

CREATE TABLE `transactions` (
 `id` bigint unsigned NOT NULL AUTO_INCREMENT,
 `currency` char(3) NOT NULL,
 `amount` float(10,2) NOT NULL,
 `created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
 PRIMARY KEY (`id`),
 KEY `created_at` (`created_at`)
) ENGINE=InnoDB;
-- Sample data
INSERT INTO `transactions` (`id`, `currency`, `amount`, `created_at`) VALUES
(1, 'AUD', 42.00, '2021-07-25 07:47:54'),
(2, 'CHF', 43.00, '2021-07-25 07:47:54'),
(3, 'BRL', 82.00, '2021-07-25 07:47:54'),
(4, 'AUD', 89.00, '2021-07-26 07:47:54'),
(5, 'CHF', 99.00, '2021-07-26 07:47:54'),
(6, 'BRL', 40.00, '2021-07-26 07:47:54'),
(7, 'SEK', 74.00, '2021-07-26 07:47:54'),
(8, 'AUD', 61.00, '2021-07-27 07:47:54'),
(9, 'CHF', 75.00, '2021-07-27 07:47:54'),
(10, 'BRL', 90.00, '2021-07-27 07:47:54'),
(11, 'MXN', 35.00, '2021-07-27 07:47:54'),
(12, 'AUD', 76.00, '2021-07-28 07:47:54'),
(13, 'CHF', 86.00, '2021-07-28 07:47:54'),
(14, 'BRL', 14.00, '2021-07-28 07:47:54'),
(15, 'USD', 70.00, '2021-07-28 07:47:54');
-- Query
SET @start = '2021-07-01 00:00:00';
SET @end = '2021-08-01 00:00:00';
SELECT
 currency,
 GROUP_CONCAT(volume) AS volume
FROM (
 SELECT
 DATE(t.created_at) AS created_at,
 t.currency AS currency,
 SUM(t.amount) AS volume
 FROM
 transactions AS t
 WHERE (t.created_at BETWEEN @start AND @end)
 GROUP BY
 created_at,
 currency
 ORDER BY
 created_at,
 currency) r
GROUP BY
 currency;
currency | volume 
:------- | :----------------------
AUD | 42.00,89.00,61.00,76.00
BRL | 82.00,40.00,90.00,14.00
CHF | 43.00,99.00,75.00,86.00
MXN | 35.00 
SEK | 74.00 
USD | 70.00 
SELECT
 currency,
 GROUP_CONCAT(volume ORDER BY gen_date ASC) AS volume
FROM (
SELECT t3.gen_date,t3.currency,IFNULL(volume,0) volume
FROM
 (SELECT
 DATE(t.created_at) AS created_at,
 t.currency AS currency,
 SUM(t.amount) AS volume
 FROM
 transactions AS t
 WHERE (t.created_at BETWEEN @start AND @end)
 GROUP BY
 created_at,
 currency
 ORDER BY
 created_at,
 currency) t1
 RIGHT JOIN
 (SELECT `currency`, gen_date
FROM
(SELECT DISTINCT `currency` FROM transactions) t1 CROSS JOIN 
(select gen_date from 
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date from
 (select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
 (select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
 (select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
 (select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
 (select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v
where gen_date between @start and @end) t2) t3 ON t3.gen_date = t1.created_at and t3.currency = t1.currency) t4
GROUP BY currency
currency | volume 
:------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------
AUD | 0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,42.00,89.00,61.00,76.00,0.00,0.00,0.00,0.00
BRL | 0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,82.00,40.00,90.00,14.00,0.00,0.00,0.00,0.00
CHF | 0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,43.00,99.00,75.00,86.00,0.00,0.00,0.00,0.00
MXN | 0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,35.00,0.00,0.00,0.00,0.00,0.00 
SEK | 0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,74.00,0.00,0.00,0.00,0.00,0.00,0.00 
USD | 0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,70.00,0.00,0.00,0.00,0.00 

db<>fiddle here

answered Jul 30, 2021 at 10:17
3
  • Thank you. I prefer @Akina's WITH RECURSIVE approach. Commented Jul 30, 2021 at 11:22
  • i didn't see mysql 8 at least i don't remeber, maybe it is still helpfull Commented Jul 30, 2021 at 11:23
  • It is! Always good to see different approaches to solve a problem! Commented Jul 30, 2021 at 11:29

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.