Combine two event tables for multiple timelines into a single result set

Question 1

This question is an extension to a question I've previously asked that was overly simplified. The more accurate example is demonstrated in this SQLFiddle, where I demonstrate a working (but slow) solution, followed by my attempt to adapt the previous answer to the actual problem.

The actual problem comes because the two tables contain events for multiple timelines.

CREATE TABLE foo (ts int, id text, foo text);
INSERT INTO foo (ts, id, foo)
VALUES
 (1, 'A', 'Lorem'),
 (1, 'B', 'ipsum'),
 (4, 'B', 'dolor'),
 (5, 'A', 'sit'),
 (8, 'A', 'amet'),
 (8, 'B', 'consectetur');
CREATE TABLE bar (ts int, id text, bar text);
INSERT INTO bar (ts, id, bar)
VALUES
 (1, 'A', 'adipiscing'),
 (5, 'B', 'elit'),
 (6, 'A', 'sed'),
 (9, 'B', 'do ');

Each table has events for timelines 'A' and 'B'. The goal is to combined the results in to a single result set showing the "state" of each timeline. The two timelines are orthogonal.

ts id foo bar
1 A Lorem adipiscing
5 A sit adipiscing
6 A sit sed
8 A amet sed
1 B ipsum (null)
4 B dolor (null)
5 B dolor elit
8 B consectetur elit
9 B consectetur do

Question 2

In addition to the solution of the simple case, add a PARTITION clause to the window functions in the inner query, to get group numbers per partition (per "timeline"). Combine group numbers with the respective timeline (id in your example) keep partitions separate in the second step:

SELECT id, ts
 , min(foo) OVER (PARTITION BY id, foo_grp) AS foo
 , min(bar) OVER (PARTITION BY id, bar_grp) AS bar
FROM (
 SELECT id, ts, f.foo, b.bar
 , count(f.foo) OVER (PARTITION BY id ORDER BY ts) AS foo_grp
 , count(b.bar) OVER (PARTITION BY id ORDER BY ts) AS bar_grp
 FROM foo f
 FULL JOIN bar b USING (id, ts)
 ) sub
ORDER BY 1, 2;

Result as requested (except with id first).
SQL Fiddle

Your attempt to adapt the previous solution was very close. It didn't work because of ~~(削除) PARTITION BY f.id (削除ここまで)~~ / ~~(削除) PARTITION BY b.id (削除ここまで)~~ instead of PARTITION BY id. You really want the combined id to include missing rows in the result - that's where the last non-null value has to be filled in for the missing (NULL) value.

If performance is your paramount requirement consider a server-side function like demonstrated in the previous answer.

Question 3

Unless I'm missing something, that looks almost exactly like what I have in the second half of my SQLFiddle, and it returns different results than the first half.

Question 4

@ChristopherCurrie: I added some explanation. And no, same result as your first half. See: sqlfiddle.com/#!15/e6ecb/7

Question 5

My existing solution is as follows:

SELECT *
FROM (
 SELECT ts, id, foo, bar
 FROM foo
 LEFT JOIN LATERAL (
 SELECT distinct on (id) bar
 FROM bar
 WHERE bar.id = foo.id
 AND bar.ts <= foo.ts 
 ORDER BY id, ts desc
 ) b ON true
 UNION
 SELECT ts, id, foo, bar
 FROM bar
 LEFT JOIN LATERAL (
 SELECT distinct on (id) foo
 FROM foo
 WHERE bar.id = foo.id
 AND foo.ts <= bar.ts
 ORDER BY id, ts desc
 ) f ON true 
) sub
ORDER BY id, ts;

This query returns the results shown in question, but the explain on the results is pretty grisly, with only 300 rows in the 'foo' table and 12k rows in the 'bar'.

Question 6

Wouldn't UNION ALL result in duplicate rows, if both tables had an event at the same timestamp?

Question 7

Right, I missed that you did not use FULL JOIN in this approach. Either way, the alternative should be substantially faster.

score 4 · Accepted Answer · 2015-07-09 22:11:01Z

In addition to the solution of the simple case, add a PARTITION clause to the window functions in the inner query, to get group numbers per partition (per "timeline"). Combine group numbers with the respective timeline (id in your example) keep partitions separate in the second step:

SELECT id, ts
 , min(foo) OVER (PARTITION BY id, foo_grp) AS foo
 , min(bar) OVER (PARTITION BY id, bar_grp) AS bar
FROM (
 SELECT id, ts, f.foo, b.bar
 , count(f.foo) OVER (PARTITION BY id ORDER BY ts) AS foo_grp
 , count(b.bar) OVER (PARTITION BY id ORDER BY ts) AS bar_grp
 FROM foo f
 FULL JOIN bar b USING (id, ts)
 ) sub
ORDER BY 1, 2;

Result as requested (except with id first).
SQL Fiddle

Your attempt to adapt the previous solution was very close. It didn't work because of ~~(削除) PARTITION BY f.id (削除ここまで)~~ / ~~(削除) PARTITION BY b.id (削除ここまで)~~ instead of PARTITION BY id. You really want the combined id to include missing rows in the result - that's where the last non-null value has to be filled in for the missing (NULL) value.

If performance is your paramount requirement consider a server-side function like demonstrated in the previous answer.

Unless I'm missing something, that looks almost exactly like what I have in the second half of my SQLFiddle, and it returns different results than the first half.
@ChristopherCurrie: I added some explanation. And no, same result as your first half. See: sqlfiddle.com/#!15/e6ecb/7

Stack Exchange Network

Combine two event tables for multiple timelines into a single result set

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Combine two event tables for multiple timelines into a single result set

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions