Select telemetry data based on relational data in PostgreSQL/TimescaleDB

Question 1

I'm trying to solve a particularly difficult problem. I am storing some telemetry data from some sensors in an SQL table (PostgreSQL) and I want to know how I can I write a query that will group the telemetry data using relational information from two other tables.

I have one table which stores the telemetry data from the sensors. This table contains three fields, one for the timestamp, one for the sensor ID, one for the value of the sensor at that time. The value column is an incrementing count (it only increases and never resets)

Sensor_Telemetry table

timestamp	sensor_id	value
2022年01月01日 00:00:00	5	3
2022年01月01日 00:00:01	5	5
2022年01月01日 00:00:02	5	6
...	...	...
2022年01月01日 01:00:00	5	675

I have another table which stores the state of the sensor, whether it was stationary or in motion and the start/end dates of that particular state for that sensor:

Status table

start_date	end_date	status	sensor_id
2022年01月01日 00:00:00	2022年01月01日 00:20:00	in_motion	5
2022年01月01日 00:20:00	2022年01月01日 00:40:00	stationary	5
2022年01月01日 00:40:00	2022年01月01日 01:00:00	in_motion	5
...	...	...	...

The sensor is located at a particular location. The Sensor table stores this metadata:

Sensor table

sensor_id	location_id
5	16

In the final table, I have the shifts that occur in each location. The shift table is a list of occurrences of all shifts, ie in this case Shift A is defined to occur every day between 00:00:00 and 00:30:00, Shift B is defined to occur every day between 00:30:00 and 01:00:00. So then the Shift table would have records like so:

Shift table

shift	location_id	occurrence_id	start_date	end_date
A Shift	16	123	2022年01月01日 00:00:00	2022年01月01日 00:30:00
B Shift	16	124	2022年01月01日 00:30:00	2022年01月01日 01:00:00
A Shift	16	123	2022年01月02日 00:00:00	2022年01月02日 00:30:00
B Shift	16	124	2022年01月02日 00:30:00	2022年01月02日 01:00:00
...	...	...	...	...

I want to write a query so that I can retrieve telemetry data that is grouped both by the shifts at the location of the sensor as well as the status of the sensor, selecting by giving a date range:

sensor_id	start_date	end_date	status	shift	value_start	value_end
5	2022年01月01日 00:00:00	2022年01月01日 00:20:00	in_motion	A Shift	3	250
5	2022年01月01日 00:20:00	2022年01月01日 00:30:00	stationary	A Shift	25	325
5	2022年01月01日 00:30:00	2022年01月01日 00:40:00	stationary	B Shift	325	490
5	2022年01月01日 00:40:00	2022年01月01日 01:00:00	in_motion	B Shift	490	675

As you can see, the telemetry data would be grouped both by the information contained in the Shift table as well as the Status table. Particularly, if you notice the sensor was in a stationary status between 2022年01月01日 00:20:00 and 2022年01月01日 00:40:00, however if you notice the 2^nd and 3^rd rows in the above table, this is cut into two rows based on the fact that the shift had changed at 2022年01月01日 00:30:00.

Is it possible to write a query that can do this? I would really appreciate it you have any ideas, thanks!

Link to db-fiddle: https://www.db-fiddle.com/f/sAXudT8qB35WWBk9xREj7p/6

Question 2

What can we assume that we know in advance? Can we assume that a) shift A will always be the first 30 mins of the hour and shift b the second one? Can we always assume that the status Stationary/In-Motion will always be on a 20 minute rota - on the hour, on the 20 and on the 40 minute marks?

Question 3

I've messed aroun with this - as you can see from below. It appears to me that your scenario is impossible to solve without having a location_id in the telemetry table!

Question 4

Tested in dbfiddle.uk:

select distinct
 sensor . sensor_id,
 shift . shift,
 shift . occurence_id,
 status . status,
 status . status_id,
 MIN(st.datetime) OVER w as start_date,
 MAX(st.datetime) OVER w as end_date,
 FIRST_VALUE(st.reading) OVER w as reading_start,
 LAST_VALUE(st.reading) OVER w as reading_end,
 MIN(st.reading) OVER w as reading_min,
 MAX(st.reading) OVER w as reading_max
from 
 status
 join sensor
 using (sensor_id)
 join shift
 on sensor.location_id = shift.location_id
 and (status.start_date, status.end_date) OVERLAPS 
 (shift.start_date, shift.end_date)
 join sensor_telemetry as st
 on st.sensor_id = sensor.sensor_id
 and tstzrange( 
 GREATEST(status.start_date, shift.start_date), 
 LEAST(status.end_date, shift.end_date) 
 ) @> st.datetime
where 
 status.sensor_id = 5 
window w as
( partition by 
 sensor . sensor_id, 
 shift . shift, 
 shift . occurence_id, 
 status . status, 
 status . status_id
 order by 
 st.datetime
 rows between unbounded preceding 
 and unbounded following
)
order by 
 start_date ;

And a bit simpler answer if we only require the MIN/MAX of the readings, not the first/last (or as the OP says we know that they always coincide): https://dbfiddle.uk/hbIcLpNn

Question 5

Here is the solution I've arrived at by making use of the OVERLAPS() function, let me know if you can improve this!

Schema (PostgreSQL v11)

create table sensor_telemetry(
 datetime TIMESTAMPTZ,
 sensor_id int,
 reading bigint
 );
 
insert into sensor_telemetry (datetime, sensor_id, reading) values
('2020-01-01T00:00:00Z', 5, 1),
('2020-01-01T00:00:01Z', 5, 2),
('2020-01-01T00:00:02Z', 5, 3),
('2020-01-01T00:00:03Z', 5, 4),
('2020-01-01T00:00:04Z', 5, 5),
('2020-01-01T00:00:05Z', 5, 6),
('2020-01-01T00:00:06Z', 5, 7),
('2020-01-01T00:00:07Z', 5, 8),
('2020-01-01T00:00:08Z', 5, 9),
('2020-01-01T00:00:09Z', 5, 10),
('2020-01-01T00:00:10Z', 5, 11),
('2020-01-01T00:00:11Z', 5, 12),
('2020-01-01T00:00:12Z', 5, 13),
('2020-01-01T00:00:13Z', 5, 14),
('2020-01-01T00:00:14Z', 5, 15),
('2020-01-01T00:00:15Z', 5, 16),
('2020-01-01T00:00:16Z', 5, 17),
('2020-01-01T00:00:17Z', 5, 18),
('2020-01-01T00:00:18Z', 5, 19),
('2020-01-01T00:00:19Z', 5, 20),
('2020-01-01T00:00:20Z', 5, 21),
('2020-01-01T00:00:21Z', 5, 22),
('2020-01-01T00:00:22Z', 5, 23),
('2020-01-01T00:00:23Z', 5, 24),
('2020-01-01T00:00:24Z', 5, 25),
('2020-01-01T00:00:25Z', 5, 26),
('2020-01-01T00:00:26Z', 5, 27),
('2020-01-01T00:00:27Z', 5, 28),
('2020-01-01T00:00:28Z', 5, 29),
('2020-01-01T00:00:29Z', 5, 30),
('2020-01-01T00:00:30Z', 5, 31),
('2020-01-01T00:00:31Z', 5, 32),
('2020-01-01T00:00:32Z', 5, 33),
('2020-01-01T00:00:33Z', 5, 34),
('2020-01-01T00:00:34Z', 5, 35),
('2020-01-01T00:00:35Z', 5, 36),
('2020-01-01T00:00:36Z', 5, 37),
('2020-01-01T00:00:37Z', 5, 38),
('2020-01-01T00:00:38Z', 5, 39),
('2020-01-01T00:00:39Z', 5, 40),
('2020-01-01T00:00:40Z', 5, 41),
('2020-01-01T00:00:41Z', 5, 42),
('2020-01-01T00:00:42Z', 5, 43),
('2020-01-01T00:00:43Z', 5, 44),
('2020-01-01T00:00:44Z', 5, 45),
('2020-01-01T00:00:45Z', 5, 46),
('2020-01-01T00:00:46Z', 5, 47),
('2020-01-01T00:00:47Z', 5, 48),
('2020-01-01T00:00:48Z', 5, 49),
('2020-01-01T00:00:49Z', 5, 50),
('2020-01-01T00:00:50Z', 5, 51),
('2020-01-01T00:00:51Z', 5, 52),
('2020-01-01T00:00:52Z', 5, 53),
('2020-01-01T00:00:53Z', 5, 54),
('2020-01-01T00:00:54Z', 5, 55),
('2020-01-01T00:00:55Z', 5, 56),
('2020-01-01T00:00:56Z', 5, 57),
('2020-01-01T00:00:57Z', 5, 58),
('2020-01-01T00:00:58Z', 5, 59),
('2020-01-01T00:00:59Z', 5, 60),
('2020-01-01T00:01:00Z', 5, 61),
('2020-01-01T00:01:01Z', 5, 62),
('2020-01-01T00:01:02Z', 5, 63),
('2020-01-01T00:01:03Z', 5, 64),
('2020-01-01T00:01:04Z', 5, 65),
('2020-01-01T00:01:05Z', 5, 66),
('2020-01-01T00:01:06Z', 5, 67),
('2020-01-01T00:01:07Z', 5, 68),
('2020-01-01T00:01:08Z', 5, 69),
('2020-01-01T00:01:09Z', 5, 70),
('2020-01-01T00:01:10Z', 5, 71),
('2020-01-01T00:01:11Z', 5, 72),
('2020-01-01T00:01:12Z', 5, 73),
('2020-01-01T00:01:13Z', 5, 74),
('2020-01-01T00:01:14Z', 5, 75),
('2020-01-01T00:01:15Z', 5, 76),
('2020-01-01T00:01:16Z', 5, 77),
('2020-01-01T00:01:17Z', 5, 78),
('2020-01-01T00:01:18Z', 5, 79),
('2020-01-01T00:01:19Z', 5, 80),
('2020-01-01T00:01:20Z', 5, 81),
('2020-01-01T00:01:21Z', 5, 82),
('2020-01-01T00:01:22Z', 5, 83),
('2020-01-01T00:01:23Z', 5, 84),
('2020-01-01T00:01:24Z', 5, 85),
('2020-01-01T00:01:25Z', 5, 86),
('2020-01-01T00:01:26Z', 5, 87),
('2020-01-01T00:01:27Z', 5, 88),
('2020-01-01T00:01:28Z', 5, 89),
('2020-01-01T00:01:29Z', 5, 90),
('2020-01-01T00:01:30Z', 5, 91),
('2020-01-01T00:01:31Z', 5, 92),
('2020-01-01T00:01:32Z', 5, 93),
('2020-01-01T00:01:33Z', 5, 94),
('2020-01-01T00:01:34Z', 5, 95),
('2020-01-01T00:01:35Z', 5, 96),
('2020-01-01T00:01:36Z', 5, 97),
('2020-01-01T00:01:37Z', 5, 98),
('2020-01-01T00:01:38Z', 5, 99),
('2020-01-01T00:01:39Z', 5, 100),
('2020-01-01T00:01:40Z', 5, 101),
('2020-01-01T00:01:41Z', 5, 102),
('2020-01-01T00:01:42Z', 5, 103),
('2020-01-01T00:01:43Z', 5, 104),
('2020-01-01T00:01:44Z', 5, 105),
('2020-01-01T00:01:45Z', 5, 106),
('2020-01-01T00:01:46Z', 5, 107),
('2020-01-01T00:01:47Z', 5, 108),
('2020-01-01T00:01:48Z', 5, 109),
('2020-01-01T00:01:49Z', 5, 110),
('2020-01-01T00:01:50Z', 5, 111),
('2020-01-01T00:01:51Z', 5, 112),
('2020-01-01T00:01:52Z', 5, 113),
('2020-01-01T00:01:53Z', 5, 114),
('2020-01-01T00:01:54Z', 5, 115),
('2020-01-01T00:01:55Z', 5, 116),
('2020-01-01T00:01:56Z', 5, 117),
('2020-01-01T00:01:57Z', 5, 118),
('2020-01-01T00:01:58Z', 5, 119),
('2020-01-01T00:01:59Z', 5, 120),
('2020-01-01T00:02:00Z', 5, 121)
;
 
 
 
create table status(
 sensor_id int,
 status_id int,
 start_date timestamptz,
 end_date timestamptz,
 status varchar(15)
 );
 
insert into status (sensor_id, status_id, start_date, end_date, status) values
(5, 8, '2020-01-01T00:00:00Z', '2020-01-01T00:01:30Z', 'stationary'),
(5, 9, '2020-01-01T00:01:30Z', '2020-01-01T00:01:45Z', 'in_motion'),
(5, 10, '2020-01-01T00:01:45Z', '2020-01-01T00:02:00Z', 'stationary');
 
create table sensor(
 sensor_id int,
 location_id int
 );
 
insert into sensor (sensor_id, location_id) values
(5, 16);
 
create table shift(
 shift varchar(15),
 location_id int,
 occurence_id int,
 start_date timestamptz,
 end_date timestamptz
 );
 
insert into shift (shift, location_id, occurence_id, start_date, end_date) values
('Shift A', 16, 123, '2020-01-01T00:00:00Z', '2020-01-01T00:01:00Z'),
('Shift B', 16, 124, '2020-01-01T00:01:00Z', '2020-01-01T00:02:00Z');

Query #1

with statuses as (
 select 
 stat.sensor_id,
 stat.status,
 stat.status_id,
 stat.start_date,
 stat.end_date,
 sens.location_id
 from status stat
 inner join sensor sens on (sens.sensor_id = stat.sensor_id)
 where stat.sensor_id=5
),
shift_status as (
 select 
 sa.sensor_id,
 s.shift,
 s.occurence_id,
 s.start_date as shift_start,
 s.end_date as shift_end,
 sa.status,
 sa.status_id,
 sa.start_date as status_start,
 sa.end_date as status_end
 from shift s
 inner join statuses sa on 
 (sa.location_id = s.location_id
 and (sa.start_date, sa.end_date) OVERLAPS (s.start_date, s.end_date))
),
phases as (
 select 
 sensor_id,
 shift,
 occurence_id,
 status,
 status_id,
 GREATEST(shift_start, status_start) as start_date,
 LEAST(shift_end, status_end) as end_date
 from shift_status 
 order by start_date
)
select 
 st.sensor_id,
 shift,
 occurence_id,
 status,
 status_id,
 MIN(datetime) as start_date,
 MAX(datetime) as end_date,
 MIN(reading) as reading_start,
 MAX(reading) as reading_end
from sensor_telemetry st
inner join phases p on 
 (st.sensor_id = p.sensor_id
 and (p.start_date, p.end_date) overlaps (st.datetime, st.datetime))
group by st.sensor_id, shift, occurence_id, status, status_id
order by start_date;

sensor_id	shift	occurence_id	status	status_id	start_date	end_date	reading_start	reading_end
5	Shift A	123	stationary	8	2020年01月01日T00:00:00.000Z	2020年01月01日T00:00:59.000Z	1	60
5	Shift B	124	stationary	8	2020年01月01日T00:01:00.000Z	2020年01月01日T00:01:29.000Z	61	90
5	Shift B	124	in_motion	9	2020年01月01日T00:01:30.000Z	2020年01月01日T00:01:44.000Z	91	105
5	Shift B	124	stationary	10	2020年01月01日T00:01:45.000Z	2020年01月01日T00:01:59.000Z	106	120

View on DB Fiddle

Question 6

I don't think the MAX and MIN(reading) give you the reading start and end, only the max and min in the duration of each group. Getting the first (start) and last (end) reading is going to be a bit more tricky.

Question 7

For me it's acceptable in this case - I know for sure that the reading can only increment in time, as it's not possible for it to decrement to a lower value. But you're right, the logic is not exactly correct in the general case

Question 8

I looked at this, quite thought-provoking +1!

To answer something like this, good data is essential (edge cases &c.), so I did the following, in particular simulating data using PostgreSQL's really powerful GENERATE_SERIES() feature (see here, here and the manual).

All of the code below (and more) is available on the fiddles within the various answers' sections. I've put in INDEXes and FOREIGN KEYs because I'd like to test (both on db<>fiddle and at home) the performance of the different approaches - in particular to see how the PostgreSQL GIST index (see here and the manual) works out.

1st answer - and general range based approach (fiddle):

I've added a location table to the OP's ones - it makes the scenario more realistic (IMHO) and will come into play for testing on multiple locations.

CREATE TABLE location
(
 location_id INTEGER NOT NULL PRIMARY KEY,
 l_desc TEXT NOT NULL
);

Only 1 record for the moment:

INSERT INTO location VALUES (16, 'Farm');

Following the OP's setup for sensor - names of fields have been changed slightly.

CREATE TABLE sensor
(
 sensor_id SMALLINT NOT NULL PRIMARY KEY,
 location_id SMALLINT NOT NULL,
 CONSTRAINT loc_FK FOREIGN KEY (location_id) REFERENCES location (location_id)
);
CREATE INDEX sensor_l_ix ON sensor(location_id);

populate:

INSERT INTO sensor VALUES (5, 16);

We only need one, because the solution(s) (will) take care of the grouping by sensor and location.

CREATE TABLE telemetry
(
 t_ts TIMESTAMPTZ NOT NULL,
 sensor_id INTEGER NOT NULL,
 t_val INTEGER NOT NULL,
 CONSTRAINT tel_pk PRIMARY KEY (sensor_id, t_ts), -- can only have one (set of) measurements 
 -- at the same time - only t_val
 CONSTRAINT tel_sens_fk FOREIGN KEY (sensor_id) REFERENCES sensor (sensor_id)
);

then:

INSERT INTO telemetry
SELECT
 ('2022-01-01 00:00:00'::TIMESTAMPTZ + (INTERVAL '1 SECOND' * (i - 1))), 5, i
FROM
(
 SELECT
 GENERATE_SERIES(1, 21700) AS i
) AS t;

I used PostgreSQL's GENERATE_SERIES() function to simulate ~20k data points - time incrementing by 1 second, and the value by 1 - seems to be what the OP requires.

CREATE TABLE status
(
 start_tz TIMESTAMPTZ NOT NULL,
 end_tz TIMESTAMPTZ NOT NULL,
 ms TEXT NOT NULL,
 sensor_id SMALLINT NOT NULL
);

and:

INSERT INTO status
SELECT
 GENERATE_SERIES('2022-01-01 00:00:00', '2022-01-01 06:20:00', INTERVAL '20 MINUTES'),
 GENERATE_SERIES('2022-01-01 00:20:00', '2022-01-01 06:40:00', INTERVAL '20 MINUTES'),
 CASE
 WHEN random() > 0.5 THEN 'in_motion)'
 ELSE 'stationary'
 END,
 5; -- <<== only one sensor - could have put in others using GENERATE_SERIES()

and now the shifts:

CREATE TABLE shift
(
 shift_id TEXT NOT NULL,
 location_id SMALLINT NOT NULL,
 occurence_id SMALLINT NOT NULL, 
 start_tz TIMESTAMPTZ NOT NULL,
 end_tz TIMESTAMPTZ NOT NULL
);

populate - shifts every thirty minutes - A & B.

WITH cte AS
(
 SELECT
 16 AS loc,
 GENERATE_SERIES(1, 15, 1) AS occur,
 GENERATE_SERIES('2022-01-01 00:00:00', '2022-01-01 07:00:00', INTERVAL '30 MINUTE')
 AS start_tz,
 GENERATE_SERIES('2022-01-01 00:30:00', '2022-01-01 07:30:00', INTERVAL '30 MINUTE')
 AS end_tz
)
INSERT INTO shift
 SELECT
 CASE 
 WHEN EXTRACT (MINUTE FROM start_tz) = 0 THEN 'Shift A'
 ELSE 'Shift B'
 END,
 loc,
 occur,
 start_tz,
 end_tz
 FROM cte;

Now, we use the ROW_NUMBER() function (manual) and the % modulo operator! I've included fields that were not strictly necessary in order to demonstrate my train of thought!

SELECT 
 ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id 
 ORDER BY st.start_tz, sh.start_tz) AS rn,
 ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id 
 ORDER BY st.start_tz, sh.start_tz) % 4 AS mod, 
 CASE 
 ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id 
 ORDER BY st.start_tz, sh.start_tz) % 4
 WHEN 1 THEN TSTZRANGE(st.start_tz, st.end_tz)
 WHEN 2 THEN TSTZRANGE(st.start_tz, sh.end_tz) -- alternates - see TIMEs
 WHEN 3 THEN TSTZRANGE(sh.start_tz, st.end_tz) -- alternates - see TIMEs
 WHEN 0 THEN TSTZRANGE(st.start_tz, st.end_tz)
 END AS range_date, 
 sh.occurence_id,
 st.start_tz::TIME AS st_start, st.end_tz::TIME AS st_end, -- ::TIME for legibility
 sh.start_tz::TIME AS sh_start, sh.end_tz::TIME AS sh_end
 
FROM sensor s
 JOIN status st USING (sensor_id)
 JOIN shift sh USING (location_id)
WHERE (st.start_tz, st.end_tz) OVERLAPS (sh.start_tz, sh.end_tz)
ORDER BY rn, mod, range_date;

Result (snipped for brevity):

rn mod range_date occurence_id st_start st_end sh_start sh_end
 1 1 ["2022年01月01日 00:00:00+00","2022年01月01日 00:20:00+00") 1 00:00:00 00:20:00 00:00:00 00:30:00
 2 2 ["2022年01月01日 00:20:00+00","2022年01月01日 00:30:00+00") 1 00:20:00 00:40:00 00:00:00 00:30:00
 3 3 ["2022年01月01日 00:30:00+00","2022年01月01日 00:40:00+00") 2 00:20:00 00:40:00 00:30:00 01:00:00
 4 0 ["2022年01月01日 00:40:00+00","2022年01月01日 01:00:00+00") 2 00:40:00 01:00:00 00:30:00 01:00:00
 5 1 ["2022年01月01日 01:00:00+00","2022年01月01日 01:20:00+00") 3 01:00:00 01:20:00 01:00:00 01:30:00

Now, we can see that for every instance of the mod field being = 1 we have the start of a new hour - so, tidying up and joining to the telemetry table, the final query is:

SELECT
 t.sensor_id AS s_id, sub.location_id AS loc_id, 
 sub.t_val AS "S_val", t.t_val AS "E_val", 
 sub.range_date, sub.occurence_id AS oc_id, sub.ms -- can have other fields...
FROM
 telemetry t
JOIN
( 
 SELECT 
 t.sensor_id, t.t_val,
 sh.occurence_id, sh.location_id, st.ms,
 ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id 
 ORDER BY st.start_tz, sh.start_tz) % 4 AS rn,
 
 CASE 
 ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id 
 ORDER BY st.start_tz, sh.start_tz) % 4
 WHEN 1 THEN TSTZRANGE(st.start_tz, st.end_tz)
 WHEN 2 THEN TSTZRANGE(st.start_tz, sh.end_tz)
 WHEN 3 THEN TSTZRANGE(sh.start_tz, st.end_tz)
 WHEN 0 THEN TSTZRANGE(st.start_tz, st.end_tz)
 END AS range_date 
 
FROM sensor s
 JOIN status st USING (sensor_id)
 JOIN shift sh USING (location_id)
 JOIN telemetry t USING (sensor_id)
WHERE (st.start_tz, st.end_tz) OVERLAPS (sh.start_tz, sh.end_tz)
 AND t.t_ts = CASE
 WHEN sh.start_tz > st.start_tz THEN sh.start_tz
 ELSE st.start_tz
 END -- <<=== Note JOIN on CASE!
) AS sub
ON t.t_ts = UPPER(range_date) - INTERVAL '1 SECOND'
ORDER BY range_date;

Result (snipped for brevity):

s_id loc_id S_val E_val range_date oc_id ms
 5 16 1 1200 ["2022年01月01日 00:00:00+00","2022年01月01日 00:20:00+00") 1 stationary
 5 16 1201 1800 ["2022年01月01日 00:20:00+00","2022年01月01日 00:30:00+00") 1 in_motion)
 5 16 1801 2400 ["2022年01月01日 00:30:00+00","2022年01月01日 00:40:00+00") 2 in_motion)
 5 16 2401 3600 ["2022年01月01日 00:40:00+00","2022年01月01日 01:00:00+00") 2 stationary
 5 16 3601 4800 ["2022年01月01日 01:00:00+00","2022年01月01日 01:20:00+00") 3 stationary

2nd answer - an exclusively range based approach (fiddle)

The setup is similar - except that I used the TSTZRANGE types in my table declarations^(*) and used GIST indexes to index those columns - check the fiddle.

^{(*) Tables status and shift}

The major differences between this answer and the 1st one is that the range syntax considerably enhances the legibilty of the final query and 2nd (to my mind anyway) makes the problem easier to reason about. The final query is as follows (see fiddle for setup and logic):

SELECT
 t.sensor_id AS s_id, sub.location_id AS loc_id, 
 sub.t_val AS "S_val", t.t_val AS "E_val", 
 sub.range_date, sub.ms, sub.occurence_id AS oc_id -- can have other fields...
FROM
 telemetry t
JOIN
( 
 SELECT 
 t.sensor_id, t.t_val,
 sh.occurence_id, st.ms, sh.location_id,
 
 CASE 
 ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id ORDER BY st.datran, sh.datran) % 4
 WHEN 1 THEN st.datran
 WHEN 2 THEN TSTZRANGE(LOWER(st.datran), UPPER(sh.datran))
 WHEN 3 THEN TSTZRANGE(LOWER(sh.datran), UPPER(st.datran))
 WHEN 0 THEN st.datran 
 END AS range_date 
FROM sensor s
 JOIN status st USING (sensor_id)
 JOIN shift sh USING (location_id)
 JOIN telemetry t USING (sensor_id)
WHERE (st.datran) && (sh.datran) -- Note the && range type OVERLAPS operator!
AND t.t_ts = GREATEST(LOWER(st.datran), LOWER(sh.datran)) -- alternative to CASE
) AS sub
ON t.t_ts = UPPER(range_date) - INTERVAL '1 SECOND'
ORDER BY range_date;

Result - same as for the first query. If it wasn't, then it would be incorrect!

3rd answer - using EXTRACT (fiddle)

This solution is a bit weaker in the sense that it assumes knowledge of the distribution of the data points - i.e. on the hour, 20, 30, 40, next hour. But the OP told us that - and we can easily change this to suit any configuration.

WITH cte1 AS
(
SELECT
 t1.t_ts, t1.t_val, t1.sensor_id,
 ROW_NUMBER() OVER (PARTITION BY t1.sensor_id ORDER BY t1.t_ts) AS rn1
FROM 
 telemetry t1
WHERE 
 EXTRACT('MINUTE' FROM t1.t_ts) = 0 AND EXTRACT('SECOND' FROM t1.t_ts) = 0
 OR EXTRACT('MINUTE' FROM t1.t_ts) = 20 AND EXTRACT('SECOND' FROM t1.t_ts) = 0
 OR EXTRACT('MINUTE' FROM t1.t_ts) = 30 AND EXTRACT('SECOND' FROM t1.t_ts) = 0
 OR EXTRACT('MINUTE' FROM t1.t_ts) = 40 AND EXTRACT('SECOND' FROM t1.t_ts) = 0 
UNION ALL -- because there can't be any duplicates, so no need to sort and filter!
SELECT
 t2.t_ts, t2.t_val, t2.sensor_id,
 ROW_NUMBER() OVER (PARTITION BY t2.sensor_id ORDER BY t2.t_ts) AS rn2
FROM 
 telemetry t2
WHERE 
 EXTRACT('MINUTE' FROM t2.t_ts) = 19 AND EXTRACT('SECOND' FROM t2.t_ts) = 59
 OR EXTRACT('MINUTE' FROM t2.t_ts) = 29 AND EXTRACT('SECOND' FROM t2.t_ts) = 59
 OR EXTRACT('MINUTE' FROM t2.t_ts) = 39 AND EXTRACT('SECOND' FROM t2.t_ts) = 59
 OR EXTRACT('MINUTE' FROM t2.t_ts) = 59 AND EXTRACT('SECOND' FROM t2.t_ts) = 59
 ORDER BY t_ts
),
cte2 AS
(
 SELECT 
 cx.rn1, cx.sensor_id AS sid, cx.t_ts AS t1, cy.t_ts AS t2, cx.t_val AS sv, cy.t_val AS ev
 FROM
 cte1 AS cx
 JOIN cte1 AS cy
 ON cx.rn1 = cy.rn1 AND cx.t_ts != cy.t_ts AND cx.t_val < cy.t_val
)
SELECT
 st.sensor_id AS sid, sh.location_id AS lid, cte2.rn1, -- rn1 not strictly necessary - left in to demonstrate!
 CASE rn1 % 2 
 WHEN 1 THEN sh.start_tz
 ELSE st.start_tz
 END AS t1,
 CASE rn1 % 2 
 WHEN 1 THEN st.start_tz
 ELSE sh.end_tz
 END AS t2, 
 
 sh.occurence_id AS oc_id, st.ms, 
 sh.shift_id AS sh_id 
 
FROM
 shift sh
JOIN status st
 ON (sh.start_tz, sh.end_tz) OVERLAPS (st.start_tz, sh.end_tz)
JOIN cte2
 ON st.sensor_id = cte2.sid
WHERE sh.start_tz <= st.start_tz AND sh.end_tz <= st.end_tz 
AND (cte2.t1, cte2.t2) OVERLAPS (sh.start_tz, sh.end_tz) 
ORDER BY sh.start_tz, sh.end_tz;

Result - same as for the first query. If it wasn't, then it would be incorrect!

Preliminary Performance Analysis: (TBD)

The usual caveats apply! It's impossible to get reliable benchmark figures from

a system where I don't have a clue what's running at the same time - this is all the more true for a server that's exposed to the internet!
a system which is configured with a very small number of records relatively speaking - I also did this:

SET enable_seqscan = off; SET enable_nestloop = off;

It remains to be seen (coming to a screen near you soon!) what happens when I perform these tests with a large number of records on a beefy home server with nothing else happening and without the "tricks" above.

score 4 · Answer 1 · 2022-09-22 00:56:13Z

Tested in dbfiddle.uk:

select distinct
 sensor . sensor_id,
 shift . shift,
 shift . occurence_id,
 status . status,
 status . status_id,
 MIN(st.datetime) OVER w as start_date,
 MAX(st.datetime) OVER w as end_date,
 FIRST_VALUE(st.reading) OVER w as reading_start,
 LAST_VALUE(st.reading) OVER w as reading_end,
 MIN(st.reading) OVER w as reading_min,
 MAX(st.reading) OVER w as reading_max
from 
 status
 join sensor
 using (sensor_id)
 join shift
 on sensor.location_id = shift.location_id
 and (status.start_date, status.end_date) OVERLAPS 
 (shift.start_date, shift.end_date)
 join sensor_telemetry as st
 on st.sensor_id = sensor.sensor_id
 and tstzrange( 
 GREATEST(status.start_date, shift.start_date), 
 LEAST(status.end_date, shift.end_date) 
 ) @> st.datetime
where 
 status.sensor_id = 5 
window w as
( partition by 
 sensor . sensor_id, 
 shift . shift, 
 shift . occurence_id, 
 status . status, 
 status . status_id
 order by 
 st.datetime
 rows between unbounded preceding 
 and unbounded following
)
order by 
 start_date ;

And a bit simpler answer if we only require the MIN/MAX of the readings, not the first/last (or as the OP says we know that they always coincide): https://dbfiddle.uk/hbIcLpNn

kk_p kk_p 815 bronze badges · Answer 2 · 2022-09-21 10:44:52Z

Here is the solution I've arrived at by making use of the OVERLAPS() function, let me know if you can improve this!

Schema (PostgreSQL v11)

create table sensor_telemetry(
 datetime TIMESTAMPTZ,
 sensor_id int,
 reading bigint
 );
 
insert into sensor_telemetry (datetime, sensor_id, reading) values
('2020-01-01T00:00:00Z', 5, 1),
('2020-01-01T00:00:01Z', 5, 2),
('2020-01-01T00:00:02Z', 5, 3),
('2020-01-01T00:00:03Z', 5, 4),
('2020-01-01T00:00:04Z', 5, 5),
('2020-01-01T00:00:05Z', 5, 6),
('2020-01-01T00:00:06Z', 5, 7),
('2020-01-01T00:00:07Z', 5, 8),
('2020-01-01T00:00:08Z', 5, 9),
('2020-01-01T00:00:09Z', 5, 10),
('2020-01-01T00:00:10Z', 5, 11),
('2020-01-01T00:00:11Z', 5, 12),
('2020-01-01T00:00:12Z', 5, 13),
('2020-01-01T00:00:13Z', 5, 14),
('2020-01-01T00:00:14Z', 5, 15),
('2020-01-01T00:00:15Z', 5, 16),
('2020-01-01T00:00:16Z', 5, 17),
('2020-01-01T00:00:17Z', 5, 18),
('2020-01-01T00:00:18Z', 5, 19),
('2020-01-01T00:00:19Z', 5, 20),
('2020-01-01T00:00:20Z', 5, 21),
('2020-01-01T00:00:21Z', 5, 22),
('2020-01-01T00:00:22Z', 5, 23),
('2020-01-01T00:00:23Z', 5, 24),
('2020-01-01T00:00:24Z', 5, 25),
('2020-01-01T00:00:25Z', 5, 26),
('2020-01-01T00:00:26Z', 5, 27),
('2020-01-01T00:00:27Z', 5, 28),
('2020-01-01T00:00:28Z', 5, 29),
('2020-01-01T00:00:29Z', 5, 30),
('2020-01-01T00:00:30Z', 5, 31),
('2020-01-01T00:00:31Z', 5, 32),
('2020-01-01T00:00:32Z', 5, 33),
('2020-01-01T00:00:33Z', 5, 34),
('2020-01-01T00:00:34Z', 5, 35),
('2020-01-01T00:00:35Z', 5, 36),
('2020-01-01T00:00:36Z', 5, 37),
('2020-01-01T00:00:37Z', 5, 38),
('2020-01-01T00:00:38Z', 5, 39),
('2020-01-01T00:00:39Z', 5, 40),
('2020-01-01T00:00:40Z', 5, 41),
('2020-01-01T00:00:41Z', 5, 42),
('2020-01-01T00:00:42Z', 5, 43),
('2020-01-01T00:00:43Z', 5, 44),
('2020-01-01T00:00:44Z', 5, 45),
('2020-01-01T00:00:45Z', 5, 46),
('2020-01-01T00:00:46Z', 5, 47),
('2020-01-01T00:00:47Z', 5, 48),
('2020-01-01T00:00:48Z', 5, 49),
('2020-01-01T00:00:49Z', 5, 50),
('2020-01-01T00:00:50Z', 5, 51),
('2020-01-01T00:00:51Z', 5, 52),
('2020-01-01T00:00:52Z', 5, 53),
('2020-01-01T00:00:53Z', 5, 54),
('2020-01-01T00:00:54Z', 5, 55),
('2020-01-01T00:00:55Z', 5, 56),
('2020-01-01T00:00:56Z', 5, 57),
('2020-01-01T00:00:57Z', 5, 58),
('2020-01-01T00:00:58Z', 5, 59),
('2020-01-01T00:00:59Z', 5, 60),
('2020-01-01T00:01:00Z', 5, 61),
('2020-01-01T00:01:01Z', 5, 62),
('2020-01-01T00:01:02Z', 5, 63),
('2020-01-01T00:01:03Z', 5, 64),
('2020-01-01T00:01:04Z', 5, 65),
('2020-01-01T00:01:05Z', 5, 66),
('2020-01-01T00:01:06Z', 5, 67),
('2020-01-01T00:01:07Z', 5, 68),
('2020-01-01T00:01:08Z', 5, 69),
('2020-01-01T00:01:09Z', 5, 70),
('2020-01-01T00:01:10Z', 5, 71),
('2020-01-01T00:01:11Z', 5, 72),
('2020-01-01T00:01:12Z', 5, 73),
('2020-01-01T00:01:13Z', 5, 74),
('2020-01-01T00:01:14Z', 5, 75),
('2020-01-01T00:01:15Z', 5, 76),
('2020-01-01T00:01:16Z', 5, 77),
('2020-01-01T00:01:17Z', 5, 78),
('2020-01-01T00:01:18Z', 5, 79),
('2020-01-01T00:01:19Z', 5, 80),
('2020-01-01T00:01:20Z', 5, 81),
('2020-01-01T00:01:21Z', 5, 82),
('2020-01-01T00:01:22Z', 5, 83),
('2020-01-01T00:01:23Z', 5, 84),
('2020-01-01T00:01:24Z', 5, 85),
('2020-01-01T00:01:25Z', 5, 86),
('2020-01-01T00:01:26Z', 5, 87),
('2020-01-01T00:01:27Z', 5, 88),
('2020-01-01T00:01:28Z', 5, 89),
('2020-01-01T00:01:29Z', 5, 90),
('2020-01-01T00:01:30Z', 5, 91),
('2020-01-01T00:01:31Z', 5, 92),
('2020-01-01T00:01:32Z', 5, 93),
('2020-01-01T00:01:33Z', 5, 94),
('2020-01-01T00:01:34Z', 5, 95),
('2020-01-01T00:01:35Z', 5, 96),
('2020-01-01T00:01:36Z', 5, 97),
('2020-01-01T00:01:37Z', 5, 98),
('2020-01-01T00:01:38Z', 5, 99),
('2020-01-01T00:01:39Z', 5, 100),
('2020-01-01T00:01:40Z', 5, 101),
('2020-01-01T00:01:41Z', 5, 102),
('2020-01-01T00:01:42Z', 5, 103),
('2020-01-01T00:01:43Z', 5, 104),
('2020-01-01T00:01:44Z', 5, 105),
('2020-01-01T00:01:45Z', 5, 106),
('2020-01-01T00:01:46Z', 5, 107),
('2020-01-01T00:01:47Z', 5, 108),
('2020-01-01T00:01:48Z', 5, 109),
('2020-01-01T00:01:49Z', 5, 110),
('2020-01-01T00:01:50Z', 5, 111),
('2020-01-01T00:01:51Z', 5, 112),
('2020-01-01T00:01:52Z', 5, 113),
('2020-01-01T00:01:53Z', 5, 114),
('2020-01-01T00:01:54Z', 5, 115),
('2020-01-01T00:01:55Z', 5, 116),
('2020-01-01T00:01:56Z', 5, 117),
('2020-01-01T00:01:57Z', 5, 118),
('2020-01-01T00:01:58Z', 5, 119),
('2020-01-01T00:01:59Z', 5, 120),
('2020-01-01T00:02:00Z', 5, 121)
;
 
 
 
create table status(
 sensor_id int,
 status_id int,
 start_date timestamptz,
 end_date timestamptz,
 status varchar(15)
 );
 
insert into status (sensor_id, status_id, start_date, end_date, status) values
(5, 8, '2020-01-01T00:00:00Z', '2020-01-01T00:01:30Z', 'stationary'),
(5, 9, '2020-01-01T00:01:30Z', '2020-01-01T00:01:45Z', 'in_motion'),
(5, 10, '2020-01-01T00:01:45Z', '2020-01-01T00:02:00Z', 'stationary');
 
create table sensor(
 sensor_id int,
 location_id int
 );
 
insert into sensor (sensor_id, location_id) values
(5, 16);
 
create table shift(
 shift varchar(15),
 location_id int,
 occurence_id int,
 start_date timestamptz,
 end_date timestamptz
 );
 
insert into shift (shift, location_id, occurence_id, start_date, end_date) values
('Shift A', 16, 123, '2020-01-01T00:00:00Z', '2020-01-01T00:01:00Z'),
('Shift B', 16, 124, '2020-01-01T00:01:00Z', '2020-01-01T00:02:00Z');

Query #1

with statuses as (
 select 
 stat.sensor_id,
 stat.status,
 stat.status_id,
 stat.start_date,
 stat.end_date,
 sens.location_id
 from status stat
 inner join sensor sens on (sens.sensor_id = stat.sensor_id)
 where stat.sensor_id=5
),
shift_status as (
 select 
 sa.sensor_id,
 s.shift,
 s.occurence_id,
 s.start_date as shift_start,
 s.end_date as shift_end,
 sa.status,
 sa.status_id,
 sa.start_date as status_start,
 sa.end_date as status_end
 from shift s
 inner join statuses sa on 
 (sa.location_id = s.location_id
 and (sa.start_date, sa.end_date) OVERLAPS (s.start_date, s.end_date))
),
phases as (
 select 
 sensor_id,
 shift,
 occurence_id,
 status,
 status_id,
 GREATEST(shift_start, status_start) as start_date,
 LEAST(shift_end, status_end) as end_date
 from shift_status 
 order by start_date
)
select 
 st.sensor_id,
 shift,
 occurence_id,
 status,
 status_id,
 MIN(datetime) as start_date,
 MAX(datetime) as end_date,
 MIN(reading) as reading_start,
 MAX(reading) as reading_end
from sensor_telemetry st
inner join phases p on 
 (st.sensor_id = p.sensor_id
 and (p.start_date, p.end_date) overlaps (st.datetime, st.datetime))
group by st.sensor_id, shift, occurence_id, status, status_id
order by start_date;

sensor_id	shift	occurence_id	status	status_id	start_date	end_date	reading_start	reading_end
5	Shift A	123	stationary	8	2020年01月01日T00:00:00.000Z	2020年01月01日T00:00:59.000Z	1	60
5	Shift B	124	stationary	8	2020年01月01日T00:01:00.000Z	2020年01月01日T00:01:29.000Z	61	90
5	Shift B	124	in_motion	9	2020年01月01日T00:01:30.000Z	2020年01月01日T00:01:44.000Z	91	105
5	Shift B	124	stationary	10	2020年01月01日T00:01:45.000Z	2020年01月01日T00:01:59.000Z	106	120

View on DB Fiddle

I don't think the MAX and MIN(reading) give you the reading start and end, only the max and min in the duration of each group. Getting the first (start) and last (end) reading is going to be a bit more tricky.
For me it's acceptable in this case - I know for sure that the reading can only increment in time, as it's not possible for it to decrement to a lower value. But you're right, the logic is not exactly correct in the general case

Vérace Vérace 31k9 gold badges73 silver badges86 bronze badges · Answer 3 · 2022-09-22 09:08:03Z

I looked at this, quite thought-provoking +1!

To answer something like this, good data is essential (edge cases &c.), so I did the following, in particular simulating data using PostgreSQL's really powerful GENERATE_SERIES() feature (see here, here and the manual).

All of the code below (and more) is available on the fiddles within the various answers' sections. I've put in INDEXes and FOREIGN KEYs because I'd like to test (both on db<>fiddle and at home) the performance of the different approaches - in particular to see how the PostgreSQL GIST index (see here and the manual) works out.

1st answer - and general range based approach (fiddle):

I've added a location table to the OP's ones - it makes the scenario more realistic (IMHO) and will come into play for testing on multiple locations.

CREATE TABLE location
(
 location_id INTEGER NOT NULL PRIMARY KEY,
 l_desc TEXT NOT NULL
);

Only 1 record for the moment:

INSERT INTO location VALUES (16, 'Farm');

Following the OP's setup for sensor - names of fields have been changed slightly.

CREATE TABLE sensor
(
 sensor_id SMALLINT NOT NULL PRIMARY KEY,
 location_id SMALLINT NOT NULL,
 CONSTRAINT loc_FK FOREIGN KEY (location_id) REFERENCES location (location_id)
);
CREATE INDEX sensor_l_ix ON sensor(location_id);

populate:

INSERT INTO sensor VALUES (5, 16);

We only need one, because the solution(s) (will) take care of the grouping by sensor and location.

CREATE TABLE telemetry
(
 t_ts TIMESTAMPTZ NOT NULL,
 sensor_id INTEGER NOT NULL,
 t_val INTEGER NOT NULL,
 CONSTRAINT tel_pk PRIMARY KEY (sensor_id, t_ts), -- can only have one (set of) measurements 
 -- at the same time - only t_val
 CONSTRAINT tel_sens_fk FOREIGN KEY (sensor_id) REFERENCES sensor (sensor_id)
);

then:

INSERT INTO telemetry
SELECT
 ('2022-01-01 00:00:00'::TIMESTAMPTZ + (INTERVAL '1 SECOND' * (i - 1))), 5, i
FROM
(
 SELECT
 GENERATE_SERIES(1, 21700) AS i
) AS t;

I used PostgreSQL's GENERATE_SERIES() function to simulate ~20k data points - time incrementing by 1 second, and the value by 1 - seems to be what the OP requires.

CREATE TABLE status
(
 start_tz TIMESTAMPTZ NOT NULL,
 end_tz TIMESTAMPTZ NOT NULL,
 ms TEXT NOT NULL,
 sensor_id SMALLINT NOT NULL
);

and:

INSERT INTO status
SELECT
 GENERATE_SERIES('2022-01-01 00:00:00', '2022-01-01 06:20:00', INTERVAL '20 MINUTES'),
 GENERATE_SERIES('2022-01-01 00:20:00', '2022-01-01 06:40:00', INTERVAL '20 MINUTES'),
 CASE
 WHEN random() > 0.5 THEN 'in_motion)'
 ELSE 'stationary'
 END,
 5; -- <<== only one sensor - could have put in others using GENERATE_SERIES()

and now the shifts:

CREATE TABLE shift
(
 shift_id TEXT NOT NULL,
 location_id SMALLINT NOT NULL,
 occurence_id SMALLINT NOT NULL, 
 start_tz TIMESTAMPTZ NOT NULL,
 end_tz TIMESTAMPTZ NOT NULL
);

populate - shifts every thirty minutes - A & B.

WITH cte AS
(
 SELECT
 16 AS loc,
 GENERATE_SERIES(1, 15, 1) AS occur,
 GENERATE_SERIES('2022-01-01 00:00:00', '2022-01-01 07:00:00', INTERVAL '30 MINUTE')
 AS start_tz,
 GENERATE_SERIES('2022-01-01 00:30:00', '2022-01-01 07:30:00', INTERVAL '30 MINUTE')
 AS end_tz
)
INSERT INTO shift
 SELECT
 CASE 
 WHEN EXTRACT (MINUTE FROM start_tz) = 0 THEN 'Shift A'
 ELSE 'Shift B'
 END,
 loc,
 occur,
 start_tz,
 end_tz
 FROM cte;

Now, we use the ROW_NUMBER() function (manual) and the % modulo operator! I've included fields that were not strictly necessary in order to demonstrate my train of thought!

SELECT 
 ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id 
 ORDER BY st.start_tz, sh.start_tz) AS rn,
 ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id 
 ORDER BY st.start_tz, sh.start_tz) % 4 AS mod, 
 CASE 
 ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id 
 ORDER BY st.start_tz, sh.start_tz) % 4
 WHEN 1 THEN TSTZRANGE(st.start_tz, st.end_tz)
 WHEN 2 THEN TSTZRANGE(st.start_tz, sh.end_tz) -- alternates - see TIMEs
 WHEN 3 THEN TSTZRANGE(sh.start_tz, st.end_tz) -- alternates - see TIMEs
 WHEN 0 THEN TSTZRANGE(st.start_tz, st.end_tz)
 END AS range_date, 
 sh.occurence_id,
 st.start_tz::TIME AS st_start, st.end_tz::TIME AS st_end, -- ::TIME for legibility
 sh.start_tz::TIME AS sh_start, sh.end_tz::TIME AS sh_end
 
FROM sensor s
 JOIN status st USING (sensor_id)
 JOIN shift sh USING (location_id)
WHERE (st.start_tz, st.end_tz) OVERLAPS (sh.start_tz, sh.end_tz)
ORDER BY rn, mod, range_date;

Result (snipped for brevity):

rn mod range_date occurence_id st_start st_end sh_start sh_end
 1 1 ["2022年01月01日 00:00:00+00","2022年01月01日 00:20:00+00") 1 00:00:00 00:20:00 00:00:00 00:30:00
 2 2 ["2022年01月01日 00:20:00+00","2022年01月01日 00:30:00+00") 1 00:20:00 00:40:00 00:00:00 00:30:00
 3 3 ["2022年01月01日 00:30:00+00","2022年01月01日 00:40:00+00") 2 00:20:00 00:40:00 00:30:00 01:00:00
 4 0 ["2022年01月01日 00:40:00+00","2022年01月01日 01:00:00+00") 2 00:40:00 01:00:00 00:30:00 01:00:00
 5 1 ["2022年01月01日 01:00:00+00","2022年01月01日 01:20:00+00") 3 01:00:00 01:20:00 01:00:00 01:30:00

Now, we can see that for every instance of the mod field being = 1 we have the start of a new hour - so, tidying up and joining to the telemetry table, the final query is:

SELECT
 t.sensor_id AS s_id, sub.location_id AS loc_id, 
 sub.t_val AS "S_val", t.t_val AS "E_val", 
 sub.range_date, sub.occurence_id AS oc_id, sub.ms -- can have other fields...
FROM
 telemetry t
JOIN
( 
 SELECT 
 t.sensor_id, t.t_val,
 sh.occurence_id, sh.location_id, st.ms,
 ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id 
 ORDER BY st.start_tz, sh.start_tz) % 4 AS rn,
 
 CASE 
 ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id 
 ORDER BY st.start_tz, sh.start_tz) % 4
 WHEN 1 THEN TSTZRANGE(st.start_tz, st.end_tz)
 WHEN 2 THEN TSTZRANGE(st.start_tz, sh.end_tz)
 WHEN 3 THEN TSTZRANGE(sh.start_tz, st.end_tz)
 WHEN 0 THEN TSTZRANGE(st.start_tz, st.end_tz)
 END AS range_date 
 
FROM sensor s
 JOIN status st USING (sensor_id)
 JOIN shift sh USING (location_id)
 JOIN telemetry t USING (sensor_id)
WHERE (st.start_tz, st.end_tz) OVERLAPS (sh.start_tz, sh.end_tz)
 AND t.t_ts = CASE
 WHEN sh.start_tz > st.start_tz THEN sh.start_tz
 ELSE st.start_tz
 END -- <<=== Note JOIN on CASE!
) AS sub
ON t.t_ts = UPPER(range_date) - INTERVAL '1 SECOND'
ORDER BY range_date;

Result (snipped for brevity):

s_id loc_id S_val E_val range_date oc_id ms
 5 16 1 1200 ["2022年01月01日 00:00:00+00","2022年01月01日 00:20:00+00") 1 stationary
 5 16 1201 1800 ["2022年01月01日 00:20:00+00","2022年01月01日 00:30:00+00") 1 in_motion)
 5 16 1801 2400 ["2022年01月01日 00:30:00+00","2022年01月01日 00:40:00+00") 2 in_motion)
 5 16 2401 3600 ["2022年01月01日 00:40:00+00","2022年01月01日 01:00:00+00") 2 stationary
 5 16 3601 4800 ["2022年01月01日 01:00:00+00","2022年01月01日 01:20:00+00") 3 stationary

2nd answer - an exclusively range based approach (fiddle)

The setup is similar - except that I used the TSTZRANGE types in my table declarations^(*) and used GIST indexes to index those columns - check the fiddle.

^{(*) Tables status and shift}

The major differences between this answer and the 1st one is that the range syntax considerably enhances the legibilty of the final query and 2nd (to my mind anyway) makes the problem easier to reason about. The final query is as follows (see fiddle for setup and logic):

SELECT
 t.sensor_id AS s_id, sub.location_id AS loc_id, 
 sub.t_val AS "S_val", t.t_val AS "E_val", 
 sub.range_date, sub.ms, sub.occurence_id AS oc_id -- can have other fields...
FROM
 telemetry t
JOIN
( 
 SELECT 
 t.sensor_id, t.t_val,
 sh.occurence_id, st.ms, sh.location_id,
 
 CASE 
 ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id ORDER BY st.datran, sh.datran) % 4
 WHEN 1 THEN st.datran
 WHEN 2 THEN TSTZRANGE(LOWER(st.datran), UPPER(sh.datran))
 WHEN 3 THEN TSTZRANGE(LOWER(sh.datran), UPPER(st.datran))
 WHEN 0 THEN st.datran 
 END AS range_date 
FROM sensor s
 JOIN status st USING (sensor_id)
 JOIN shift sh USING (location_id)
 JOIN telemetry t USING (sensor_id)
WHERE (st.datran) && (sh.datran) -- Note the && range type OVERLAPS operator!
AND t.t_ts = GREATEST(LOWER(st.datran), LOWER(sh.datran)) -- alternative to CASE
) AS sub
ON t.t_ts = UPPER(range_date) - INTERVAL '1 SECOND'
ORDER BY range_date;

Result - same as for the first query. If it wasn't, then it would be incorrect!

3rd answer - using EXTRACT (fiddle)

This solution is a bit weaker in the sense that it assumes knowledge of the distribution of the data points - i.e. on the hour, 20, 30, 40, next hour. But the OP told us that - and we can easily change this to suit any configuration.

WITH cte1 AS
(
SELECT
 t1.t_ts, t1.t_val, t1.sensor_id,
 ROW_NUMBER() OVER (PARTITION BY t1.sensor_id ORDER BY t1.t_ts) AS rn1
FROM 
 telemetry t1
WHERE 
 EXTRACT('MINUTE' FROM t1.t_ts) = 0 AND EXTRACT('SECOND' FROM t1.t_ts) = 0
 OR EXTRACT('MINUTE' FROM t1.t_ts) = 20 AND EXTRACT('SECOND' FROM t1.t_ts) = 0
 OR EXTRACT('MINUTE' FROM t1.t_ts) = 30 AND EXTRACT('SECOND' FROM t1.t_ts) = 0
 OR EXTRACT('MINUTE' FROM t1.t_ts) = 40 AND EXTRACT('SECOND' FROM t1.t_ts) = 0 
UNION ALL -- because there can't be any duplicates, so no need to sort and filter!
SELECT
 t2.t_ts, t2.t_val, t2.sensor_id,
 ROW_NUMBER() OVER (PARTITION BY t2.sensor_id ORDER BY t2.t_ts) AS rn2
FROM 
 telemetry t2
WHERE 
 EXTRACT('MINUTE' FROM t2.t_ts) = 19 AND EXTRACT('SECOND' FROM t2.t_ts) = 59
 OR EXTRACT('MINUTE' FROM t2.t_ts) = 29 AND EXTRACT('SECOND' FROM t2.t_ts) = 59
 OR EXTRACT('MINUTE' FROM t2.t_ts) = 39 AND EXTRACT('SECOND' FROM t2.t_ts) = 59
 OR EXTRACT('MINUTE' FROM t2.t_ts) = 59 AND EXTRACT('SECOND' FROM t2.t_ts) = 59
 ORDER BY t_ts
),
cte2 AS
(
 SELECT 
 cx.rn1, cx.sensor_id AS sid, cx.t_ts AS t1, cy.t_ts AS t2, cx.t_val AS sv, cy.t_val AS ev
 FROM
 cte1 AS cx
 JOIN cte1 AS cy
 ON cx.rn1 = cy.rn1 AND cx.t_ts != cy.t_ts AND cx.t_val < cy.t_val
)
SELECT
 st.sensor_id AS sid, sh.location_id AS lid, cte2.rn1, -- rn1 not strictly necessary - left in to demonstrate!
 CASE rn1 % 2 
 WHEN 1 THEN sh.start_tz
 ELSE st.start_tz
 END AS t1,
 CASE rn1 % 2 
 WHEN 1 THEN st.start_tz
 ELSE sh.end_tz
 END AS t2, 
 
 sh.occurence_id AS oc_id, st.ms, 
 sh.shift_id AS sh_id 
 
FROM
 shift sh
JOIN status st
 ON (sh.start_tz, sh.end_tz) OVERLAPS (st.start_tz, sh.end_tz)
JOIN cte2
 ON st.sensor_id = cte2.sid
WHERE sh.start_tz <= st.start_tz AND sh.end_tz <= st.end_tz 
AND (cte2.t1, cte2.t2) OVERLAPS (sh.start_tz, sh.end_tz) 
ORDER BY sh.start_tz, sh.end_tz;

Result - same as for the first query. If it wasn't, then it would be incorrect!

Preliminary Performance Analysis: (TBD)

The usual caveats apply! It's impossible to get reliable benchmark figures from

a system where I don't have a clue what's running at the same time - this is all the more true for a server that's exposed to the internet!
a system which is configured with a very small number of records relatively speaking - I also did this:

SET enable_seqscan = off; SET enable_nestloop = off;

It remains to be seen (coming to a screen near you soon!) what happens when I perform these tests with a large number of records on a beefy home server with nothing else happening and without the "tricks" above.

Stack Exchange Network

Select telemetry data based on relational data in PostgreSQL/TimescaleDB

3 Answers 3

1st answer - and general range based approach (fiddle):

2nd answer - an exclusively range based approach (fiddle)

3rd answer - using EXTRACT (fiddle)

Preliminary Performance Analysis: (TBD)

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Select telemetry data based on relational data in PostgreSQL/TimescaleDB

3 Answers 3

1st answer - and general range based approach (fiddle):

2nd answer - an exclusively range based approach (fiddle)

3rd answer - using EXTRACT (fiddle)

Preliminary Performance Analysis: (TBD)

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions