SELECT / DELETE rows where consecutive null value count is greater than N

Question 1

I have a table of the following structure, where all columns except the last are non-null:

ID	receiver	send_time	recv_time
1	A	00:00:00	00:00:01
2	A	00:00:01	NULL
3	A	00:00:02	NULL
4	A	00:00:03	NULL
5	A	00:00:04	00:00:05
6	B	00:00:00	00:00:01
7	B	00:00:01	NULL
8	B	00:00:02	00:00:03
9	A	00:00:05	NULL
10	B	00:00:03	NULL
11	A	00:00:06	00:00:07

I want to select (and ultimately delete, but I think selecting would be the first step) all rows, where the recv_time column is NULL for n or more consecutive rows, partitioned by the receiver column and ordered by the send_time column.

So from the above example, if n=2, I want to select/delete the rows with ID 2, 3 and 4, but not rows 7, 9, or 10.

I think a count with partition is needed, but I can't figure out how to "reset" the count of null values as soon as a non-null row is encountered. This is what I've got so far:

SELECT 
 id, receiver, send_time, recv_time, 
 COUNT(CASE WHEN recv_time IS NULL THEN 1 END) OVER (PARTITION BY receiver ORDER BY send_time)
FROM test 
ORDER BY id

which gives me:

ID	receiver	send_time	recv_time	count
1	A	00:00:00	00:00:01	0
2	A	00:00:01	NULL	1
3	A	00:00:02	NULL	2
4	A	00:00:03	NULL	3
5	A	00:00:04	00:00:05	3
6	B	00:00:00	00:00:01	0
7	B	00:00:01	NULL	1
8	B	00:00:02	00:00:03	1
9	A	00:00:05	NULL	4
10	B	00:00:03	NULL	2
11	A	00:00:06	00:00:07	4

But I want the count to reset at rows 5, 8, 11. I'm also not sure how I would proceed after that, because I want to capture all rows in the "group" of consecutive NULL rows, but obviously the count for row 2 in this example is smaller than n.

Question 2

To just select:

SELECT *
FROM (
 SELECT *, count(*) OVER (PARTITION BY receiver, grp) AS grp_count
 FROM (
 SELECT *, count(recv_time) OVER (PARTITION BY receiver ORDER BY send_time) AS grp
 FROM test
 ) sub
 ) d
WHERE d.grp_count > 2
AND d.recv_time IS NULL;

Basically, the exact opposite of what you tried: we want the count to stop for rows with null values. Meaning, you were close, with one step in the logic inverted.

The "trick" is to do a running count(recv_time). Null values don't count, so the count stalls for those rows. Candidates for removal end up in groups (grp) with more than 2 members for n = 2.

To delete:

DELETE FROM test t
USING ( -- same as above
 SELECT *, count(*) OVER (PARTITION BY receiver, grp) AS grp_count
 FROM (
 SELECT *, count(recv_time) OVER (PARTITION BY receiver ORDER BY send_time) AS grp
 FROM test
 ) sub
 ) d
WHERE d.grp_count > 2
AND d.recv_time IS NULL
AND t.id = d.id;

fiddle

Question 3

Thanks for the detailed explanation. Can you clarify why you select groups where grp_count > 3, as opposed to where grp_count > 2? Keeping the example from the original question where I want to find all groups of size equal to n or greater. I understand each group also contains the last row where recv_time was not null, but still I think the correct select would be where grp_count > n, and not where grp_count > n+1.

Question 4

@Edgxxar: Oh, sorry. Your understanding is impeccable. I operated under the diverging assumption n = 3. Adapated to n = 2 now to make it agree with the question.

score 1 · Accepted Answer · 2023-09-17 07:52:02Z

To just select:

SELECT *
FROM (
 SELECT *, count(*) OVER (PARTITION BY receiver, grp) AS grp_count
 FROM (
 SELECT *, count(recv_time) OVER (PARTITION BY receiver ORDER BY send_time) AS grp
 FROM test
 ) sub
 ) d
WHERE d.grp_count > 2
AND d.recv_time IS NULL;

Basically, the exact opposite of what you tried: we want the count to stop for rows with null values. Meaning, you were close, with one step in the logic inverted.

The "trick" is to do a running count(recv_time). Null values don't count, so the count stalls for those rows. Candidates for removal end up in groups (grp) with more than 2 members for n = 2.

To delete:

DELETE FROM test t
USING ( -- same as above
 SELECT *, count(*) OVER (PARTITION BY receiver, grp) AS grp_count
 FROM (
 SELECT *, count(recv_time) OVER (PARTITION BY receiver ORDER BY send_time) AS grp
 FROM test
 ) sub
 ) d
WHERE d.grp_count > 2
AND d.recv_time IS NULL
AND t.id = d.id;

fiddle

Thanks for the detailed explanation. Can you clarify why you select groups where grp_count > 3, as opposed to where grp_count > 2? Keeping the example from the original question where I want to find all groups of size equal to n or greater. I understand each group also contains the last row where recv_time was not null, but still I think the correct select would be where grp_count > n, and not where grp_count > n+1.
@Edgxxar: Oh, sorry. Your understanding is impeccable. I operated under the diverging assumption n = 3. Adapated to n = 2 now to make it agree with the question.

Stack Exchange Network

SELECT / DELETE rows where consecutive null value count is greater than N

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

SELECT / DELETE rows where consecutive null value count is greater than N

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions