Given a table containing a date for every day over a period
CREATE TABLE `tbl_calendar` (
`date` date NOT NULL,
PRIMARY KEY (`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `tbl_calendar` (`date`)
VALUES
('2016-12-10'),
('2016-12-09'),
('2016-12-08'),
('2016-12-07'),
('2016-12-06'),
('2016-12-05'),
('2016-12-04'),
('2016-12-03'),
('2016-12-02'),
('2016-12-01')
;
And a table containing values, of different types, with values missing for random days where they have not been populated.
CREATE TABLE `tbl_values` (
`value_id` int(11) NOT NULL AUTO_INCREMENT,
`type_id` int(11) NOT NULL DEFAULT '0',
`date` date DEFAULT NULL,
`value` double(15,2) DEFAULT '0.00',
PRIMARY KEY (`value_id`),
KEY `type_id_date` (`type_id`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `tbl_values` (`type_id`, `date`, `value`)
VALUES
(100, '2016-12-02', 1),
(100, '2016-12-04', 2),
(100, '2016-12-06', 3),
(100, '2016-12-08', 4),
(100, '2016-12-10', 5)
;
How can the values for the missing days be returned in a SELECT, using the most recent previous record for that type? here is what I have so far.
SELECT
v1.type,
c.date,
v1.value
FROM
tbl_calendar c
LEFT JOIN tbl_values v1 ON (
v1.type_id = 100
AND v1.date <= c.date
)
LEFT JOIN tbl_values v2 ON (
v2.type_id = 100
AND v2.date < c.date
AND v2.date > o1.date
)
WHERE
v1.date = c.date
OR v2.date IS NULL
The problem with this query is that it returns the previous most recent records value when there is a value for that date, along with a record with the correct value.
Expected Output
a_vlad answer is correct in results, however poor on performance, which is expected.
SELECT
t1.date,
(SELECT v1.type_id FROM tbl_values v1 where v1.date <= t1.date ORDER BY v1.date desc limit 1) as `type`,
(SELECT v1.`value` FROM tbl_values v1 where v1.date <= t1.date ORDER BY v1.date desc limit 1) as `value`
FROM tbl_calendar t1
HAVING `type` IS NOT NULL
The final solution that I used was to use a_vlad query to create a summary table. However, it turned out not to increase performance of the system (as previously the missing values were filled in in a PHP loop, which worked out to be as fast)
2 Answers 2
A correct form of the query would be:
SELECT
t1.date,
(SELECT v1.type_id FROM tbl_values v1 where v1.type_id = 100 AND v1.date <= t1.date ORDER BY v1.date desc limit 1) as `type`,
(SELECT v1.`value` FROM tbl_values v1 where v1.type_id = 100 AND v1.date <= t1.date ORDER BY v1.date desc limit 1) as `value`
FROM tbl_calendar t1
having `type` IS NOT NULL
with result as:
2016年12月02日 100 1.00
2016年12月03日 100 1.00
2016年12月04日 100 2.00
2016年12月05日 100 2.00
2016年12月06日 100 3.00
2016年12月07日 100 3.00
2016年12月08日 100 4.00
2016年12月09日 100 4.00
2016年12月10日 100 5.00
but, again look at your query and your expected result please.
What do you want to have in the result (type, date, value) with:
SELECT
c.date,
o1.value_id AS "o1.value_id",
o2.value_id AS "o2.value_id",
o1.type_id AS "o1.type_id",
o2.type_id AS "o2.type_id",
o1.date AS "o1.date",
o2.date AS "o2.date"
If I understand the question correctly you want the lag value where there is no direct value.
Permit me a little OCD. Steer well clear of any table or field names that are reserved words in any of the ANSI or ODBC standards or MySQL reserved words. I would ditch the tbl_ prefix as well.
I've renamed your date
field to Perioddate
Try this
SELECT v2.type_id,DT.PeriodDate, V2.my_value
FROM (
SELECT C.PeriodDate,MAX(V1.PeriodDate) AS PrecedingDate
FROM tbl_calendar AS C
LEFT JOIN tbl_values AS V1
ON C.PeriodDate>=V1.PeriodDate
AND V1.type_id = 100
GROUP BY C.PeriodDate
) AS DT
INNER JOIN tbl_values AS V2
ON DT.PrecedingDate = V2.PeriodDate
AND V2.type_id = 100
The way it works is that the derived table builds up a list of dates and the highest date in the values table that is either less than or the same as the entry in the calendar table. If there is no value the same then the highest date will be the preceding date.
On the subject of indexes make sure that your most selective column comes first i.e. the PeriodDate column, followed by your type_id.
What does the auto_increment column give you apart from a surrogate key? If PeriodDate and type_id should be unique then make this your primary key.
I don't know if the query optimiser in MySQL does anything special with unique keys but it certainly does in other DB platforms.