I have a table of the following structure, where all columns except the last are non-null:
ID | receiver | send_time | recv_time |
---|---|---|---|
1 | A | 00:00:00 | 00:00:01 |
2 | A | 00:00:01 | NULL |
3 | A | 00:00:02 | NULL |
4 | A | 00:00:03 | NULL |
5 | A | 00:00:04 | 00:00:05 |
6 | B | 00:00:00 | 00:00:01 |
7 | B | 00:00:01 | NULL |
8 | B | 00:00:02 | 00:00:03 |
9 | A | 00:00:05 | NULL |
10 | B | 00:00:03 | NULL |
11 | A | 00:00:06 | 00:00:07 |
I want to select (and ultimately delete, but I think selecting would be the first step) all rows, where the recv_time
column is NULL
for n or more consecutive rows, partitioned by the receiver
column and ordered by the send_time
column.
So from the above example, if n=2, I want to select/delete the rows with ID 2, 3 and 4, but not rows 7, 9, or 10.
I think a count with partition is needed, but I can't figure out how to "reset" the count of null values as soon as a non-null row is encountered. This is what I've got so far:
SELECT
id, receiver, send_time, recv_time,
COUNT(CASE WHEN recv_time IS NULL THEN 1 END) OVER (PARTITION BY receiver ORDER BY send_time)
FROM test
ORDER BY id
which gives me:
ID | receiver | send_time | recv_time | count |
---|---|---|---|---|
1 | A | 00:00:00 | 00:00:01 | 0 |
2 | A | 00:00:01 | NULL | 1 |
3 | A | 00:00:02 | NULL | 2 |
4 | A | 00:00:03 | NULL | 3 |
5 | A | 00:00:04 | 00:00:05 | 3 |
6 | B | 00:00:00 | 00:00:01 | 0 |
7 | B | 00:00:01 | NULL | 1 |
8 | B | 00:00:02 | 00:00:03 | 1 |
9 | A | 00:00:05 | NULL | 4 |
10 | B | 00:00:03 | NULL | 2 |
11 | A | 00:00:06 | 00:00:07 | 4 |
But I want the count to reset at rows 5, 8, 11. I'm also not sure how I would proceed after that, because I want to capture all rows in the "group" of consecutive NULL rows, but obviously the count for row 2 in this example is smaller than n.
1 Answer 1
To just select:
SELECT *
FROM (
SELECT *, count(*) OVER (PARTITION BY receiver, grp) AS grp_count
FROM (
SELECT *, count(recv_time) OVER (PARTITION BY receiver ORDER BY send_time) AS grp
FROM test
) sub
) d
WHERE d.grp_count > 2
AND d.recv_time IS NULL;
Basically, the exact opposite of what you tried: we want the count to stop for rows with null
values. Meaning, you were close, with one step in the logic inverted.
The "trick" is to do a running count(recv_time)
. Null values don't count, so the count stalls for those rows. Candidates for removal end up in groups (grp
) with more than 2 members for n = 2
.
To delete:
DELETE FROM test t
USING ( -- same as above
SELECT *, count(*) OVER (PARTITION BY receiver, grp) AS grp_count
FROM (
SELECT *, count(recv_time) OVER (PARTITION BY receiver ORDER BY send_time) AS grp
FROM test
) sub
) d
WHERE d.grp_count > 2
AND d.recv_time IS NULL
AND t.id = d.id;
Related:
-
Thanks for the detailed explanation. Can you clarify why you select groups where
grp_count > 3
, as opposed to wheregrp_count > 2
? Keeping the example from the original question where I want to find all groups of size equal to n or greater. I understand each group also contains the last row whererecv_time
was not null, but still I think the correct select would be wheregrp_count > n
, and not wheregrp_count > n+1
.Edgxxar– Edgxxar2023年09月19日 11:06:03 +00:00Commented Sep 19, 2023 at 11:06 -
@Edgxxar: Oh, sorry. Your understanding is impeccable. I operated under the diverging assumption
n = 3
. Adapated ton = 2
now to make it agree with the question.Erwin Brandstetter– Erwin Brandstetter2023年09月20日 00:23:24 +00:00Commented Sep 20, 2023 at 0:23
Explore related questions
See similar questions with these tags.