How can I assign group IDs depending on content?

Question 1

I have the following table with data as shown below; linenrs are monotonically increasing but not necessarily consecutive; when the key field contains an ellipsis ... that indicates an entry that is continued from above:

create table source (
 linenr integer unique not null,
 key text not null,
 value text );
insert into source values
 ( 2, 'tags', 'a' ),
 ( 3, '...', 'b' ),
 ( 4, 'title', 'The Title' ),
 ( 5, 'note', 'this is' ),
 ( 6, '...', 'an EXAMPLE' ),
 ( 8, 'title', 'over' ),
 ( 9, '...', 'three' ),
 ( 10, '...', 'lines' ),
 ( 11, 'about', 'grouping' );

Now I want to have a view where a group number is assigned based on the content of the key field; group numbers do not have to be consecutive but should be distinct for each group of rows that starts with a key other than ... and continues thru all lines where key is ..., like this:

╔════════╤═══════╤═══════╤════════════╗
║ linenr │ group │ key │ value ║
╠════════╪═══════╪═══════╪════════════╣
║ 2 │ 1 │ tags │ a ║
║ 3 │ 1 │ ... │ b ║
║ 4 │ 2 │ title │ The Title ║
║ 5 │ 3 │ note │ this is ║
║ 6 │ 3 │ ... │ an EXAMPLE ║
║ 8 │ 4 │ title │ over ║
║ 9 │ 4 │ ... │ three ║
║ 10 │ 4 │ ... │ lines ║
║ 11 │ 5 │ about │ grouping ║
╚════════╧═══════╧═══════╧════════════╝

I tried to do this with windows / partitions and the tabibitosan pattern, but haven't been able to come up with anything working; also, using at the preceding row with lag() doesn't help where there are multiple consecutive lines with .... In a spreadsheet this is an extremely easy thing to do but in SQL I can seemingly not refer to the previous row of the current query, can I?

Question 2

create table source (
 linenr integer unique not null,
 key text not null,
 value text );
insert into source values
 ( 2, 'tags', 'a' ),
 ( 3, '...', 'b' ),
 ( 4, 'title', 'The Title' ),
 ( 5, 'note', 'this is' ),
 ( 6, '...', 'an EXAMPLE' ),
 ( 8, 'title', 'over' ),
 ( 9, '...', 'three' ),
 ( 10, '...', 'lines' ),
 ( 11, 'about', 'grouping' );

You can get it using a simple CASE.

select linenr, key, value,
 case when key <> '...' then 1 end as rst
from source;

linenr | key | value | rst
-----: | :---- | :--------- | ---:
 2 | tags | a | 1
 3 | ... | b | null
 4 | title | The Title | 1
 5 | note | this is | 1
 6 | ... | an EXAMPLE | null
 8 | title | over | 1
 9 | ... | three | null
 10 | ... | lines | null
 11 | about | grouping | 1

Now SUM(rst) ignores null values and do the trick.

select linenr, key, value,
 sum(rst) over (order by linenr) grp
from (
 select linenr, key, value,
 case when key <> '...' then 1 end as rst
 from source
 ) x;

linenr | key | value | grp
-----: | :---- | :--------- | --:
 2 | tags | a | 1
 3 | ... | b | 1
 4 | title | The Title | 2
 5 | note | this is | 3
 6 | ... | an EXAMPLE | 3
 8 | title | over | 4
 9 | ... | three | 4
 10 | ... | lines | 4
 11 | about | grouping | 5

dbfiddle here

Question 3

I'm glad to be able to help.

Question 4

Cheaper and simpler with a single window function:

SELECT *
 , count(key <> '...' OR NULL) OVER (ORDER BY linenr) AS grp
FROM source;

dbfiddle here

Question 5

!!! this. To be sure, the accepted solution is probably more instructive to the neophyte; OTOH the lazy evaluation part of key != '...' or null is something useful to keep in mind. Problem solved with a crisp, performant one-liner, what's not to like?

McNets McNets 24k11 gold badges51 silver badges90 bronze badges · Accepted Answer · 2018-02-05 21:28:43Z

create table source (
 linenr integer unique not null,
 key text not null,
 value text );
insert into source values
 ( 2, 'tags', 'a' ),
 ( 3, '...', 'b' ),
 ( 4, 'title', 'The Title' ),
 ( 5, 'note', 'this is' ),
 ( 6, '...', 'an EXAMPLE' ),
 ( 8, 'title', 'over' ),
 ( 9, '...', 'three' ),
 ( 10, '...', 'lines' ),
 ( 11, 'about', 'grouping' );

You can get it using a simple CASE.

select linenr, key, value,
 case when key <> '...' then 1 end as rst
from source;

linenr | key | value | rst
-----: | :---- | :--------- | ---:
 2 | tags | a | 1
 3 | ... | b | null
 4 | title | The Title | 1
 5 | note | this is | 1
 6 | ... | an EXAMPLE | null
 8 | title | over | 1
 9 | ... | three | null
 10 | ... | lines | null
 11 | about | grouping | 1

Now SUM(rst) ignores null values and do the trick.

select linenr, key, value,
 sum(rst) over (order by linenr) grp
from (
 select linenr, key, value,
 case when key <> '...' then 1 end as rst
 from source
 ) x;

linenr | key | value | grp
-----: | :---- | :--------- | --:
 2 | tags | a | 1
 3 | ... | b | 1
 4 | title | The Title | 2
 5 | note | this is | 3
 6 | ... | an EXAMPLE | 3
 8 | title | over | 4
 9 | ... | three | 4
 10 | ... | lines | 4
 11 | about | grouping | 5

dbfiddle here

I'm glad to be able to help.

McNets
– McNets

2018年02月05日 21:48:37 +00:00
Commented Feb 5, 2018 at 21:48

Stack Exchange Network

How can I assign group IDs depending on content?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

How can I assign group IDs depending on content?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions