This question is an extension to a question I've previously asked that was overly simplified. The more accurate example is demonstrated in this SQLFiddle, where I demonstrate a working (but slow) solution, followed by my attempt to adapt the previous answer to the actual problem.
The actual problem comes because the two tables contain events for multiple timelines.
CREATE TABLE foo (ts int, id text, foo text);
INSERT INTO foo (ts, id, foo)
VALUES
(1, 'A', 'Lorem'),
(1, 'B', 'ipsum'),
(4, 'B', 'dolor'),
(5, 'A', 'sit'),
(8, 'A', 'amet'),
(8, 'B', 'consectetur');
CREATE TABLE bar (ts int, id text, bar text);
INSERT INTO bar (ts, id, bar)
VALUES
(1, 'A', 'adipiscing'),
(5, 'B', 'elit'),
(6, 'A', 'sed'),
(9, 'B', 'do ');
Each table has events for timelines 'A' and 'B'. The goal is to combined the results in to a single result set showing the "state" of each timeline. The two timelines are orthogonal.
ts id foo bar 1 A Lorem adipiscing 5 A sit adipiscing 6 A sit sed 8 A amet sed 1 B ipsum (null) 4 B dolor (null) 5 B dolor elit 8 B consectetur elit 9 B consectetur do
2 Answers 2
In addition to the solution of the simple case, add a PARTITION
clause to the window functions in the inner query, to get group numbers per partition (per "timeline"). Combine group numbers with the respective timeline (id
in your example) keep partitions separate in the second step:
SELECT id, ts
, min(foo) OVER (PARTITION BY id, foo_grp) AS foo
, min(bar) OVER (PARTITION BY id, bar_grp) AS bar
FROM (
SELECT id, ts, f.foo, b.bar
, count(f.foo) OVER (PARTITION BY id ORDER BY ts) AS foo_grp
, count(b.bar) OVER (PARTITION BY id ORDER BY ts) AS bar_grp
FROM foo f
FULL JOIN bar b USING (id, ts)
) sub
ORDER BY 1, 2;
Result as requested (except with id
first).
SQL Fiddle
Your attempt to adapt the previous solution was very close. It didn't work because of (削除) / PARTITION BY f.id
(削除ここまで)(削除) instead of PARTITION BY b.id
(削除ここまで)PARTITION BY id
. You really want the combined id
to include missing rows in the result - that's where the last non-null value has to be filled in for the missing (NULL) value.
If performance is your paramount requirement consider a server-side function like demonstrated in the previous answer.
-
Unless I'm missing something, that looks almost exactly like what I have in the second half of my SQLFiddle, and it returns different results than the first half.Christopher Currie– Christopher Currie2015年07月09日 22:27:44 +00:00Commented Jul 9, 2015 at 22:27
-
@ChristopherCurrie: I added some explanation. And no, same result as your first half. See: sqlfiddle.com/#!15/e6ecb/7Erwin Brandstetter– Erwin Brandstetter2015年07月09日 22:28:04 +00:00Commented Jul 9, 2015 at 22:28
My existing solution is as follows:
SELECT *
FROM (
SELECT ts, id, foo, bar
FROM foo
LEFT JOIN LATERAL (
SELECT distinct on (id) bar
FROM bar
WHERE bar.id = foo.id
AND bar.ts <= foo.ts
ORDER BY id, ts desc
) b ON true
UNION
SELECT ts, id, foo, bar
FROM bar
LEFT JOIN LATERAL (
SELECT distinct on (id) foo
FROM foo
WHERE bar.id = foo.id
AND foo.ts <= bar.ts
ORDER BY id, ts desc
) f ON true
) sub
ORDER BY id, ts;
This query returns the results shown in question, but the explain on the results is pretty grisly, with only 300 rows in the 'foo' table and 12k rows in the 'bar'.
-
Wouldn't
UNION ALL
result in duplicate rows, if both tables had an event at the same timestamp?Christopher Currie– Christopher Currie2015年07月09日 22:24:46 +00:00Commented Jul 9, 2015 at 22:24 -
Right, I missed that you did not use
FULL JOIN
in this approach. Either way, the alternative should be substantially faster.Erwin Brandstetter– Erwin Brandstetter2015年07月09日 22:49:15 +00:00Commented Jul 9, 2015 at 22:49