I'm trying to solve a particularly difficult problem. I am storing some telemetry data from some sensors in an SQL table (PostgreSQL) and I want to know how I can I write a query that will group the telemetry data using relational information from two other tables.
I have one table which stores the telemetry data from the sensors. This table contains three fields, one for the timestamp, one for the sensor ID, one for the value of the sensor at that time. The value column is an incrementing count (it only increases and never resets)
Sensor_Telemetry table
timestamp | sensor_id | value |
---|---|---|
2022年01月01日 00:00:00 | 5 | 3 |
2022年01月01日 00:00:01 | 5 | 5 |
2022年01月01日 00:00:02 | 5 | 6 |
... | ... | ... |
2022年01月01日 01:00:00 | 5 | 675 |
I have another table which stores the state of the sensor, whether it was stationary or in motion and the start/end dates of that particular state for that sensor:
Status table
start_date | end_date | status | sensor_id |
---|---|---|---|
2022年01月01日 00:00:00 | 2022年01月01日 00:20:00 | in_motion | 5 |
2022年01月01日 00:20:00 | 2022年01月01日 00:40:00 | stationary | 5 |
2022年01月01日 00:40:00 | 2022年01月01日 01:00:00 | in_motion | 5 |
... | ... | ... | ... |
The sensor is located at a particular location. The Sensor table stores this metadata:
Sensor table
sensor_id | location_id |
---|---|
5 | 16 |
In the final table, I have the shifts that occur in each location. The shift table is a list of occurrences of all shifts, ie in this case Shift A is defined to occur every day between 00:00:00 and 00:30:00, Shift B is defined to occur every day between 00:30:00 and 01:00:00. So then the Shift table would have records like so:
Shift table
shift | location_id | occurrence_id | start_date | end_date |
---|---|---|---|---|
A Shift | 16 | 123 | 2022年01月01日 00:00:00 | 2022年01月01日 00:30:00 |
B Shift | 16 | 124 | 2022年01月01日 00:30:00 | 2022年01月01日 01:00:00 |
A Shift | 16 | 123 | 2022年01月02日 00:00:00 | 2022年01月02日 00:30:00 |
B Shift | 16 | 124 | 2022年01月02日 00:30:00 | 2022年01月02日 01:00:00 |
... | ... | ... | ... | ... |
I want to write a query so that I can retrieve telemetry data that is grouped both by the shifts at the location of the sensor as well as the status of the sensor, selecting by giving a date range:
sensor_id | start_date | end_date | status | shift | value_start | value_end |
---|---|---|---|---|---|---|
5 | 2022年01月01日 00:00:00 | 2022年01月01日 00:20:00 | in_motion | A Shift | 3 | 250 |
5 | 2022年01月01日 00:20:00 | 2022年01月01日 00:30:00 | stationary | A Shift | 25 | 325 |
5 | 2022年01月01日 00:30:00 | 2022年01月01日 00:40:00 | stationary | B Shift | 325 | 490 |
5 | 2022年01月01日 00:40:00 | 2022年01月01日 01:00:00 | in_motion | B Shift | 490 | 675 |
As you can see, the telemetry data would be grouped both by the information contained in the Shift table as well as the Status table. Particularly, if you notice the sensor was in a stationary
status between 2022年01月01日 00:20:00 and 2022年01月01日 00:40:00, however if you notice the 2nd and 3rd rows in the above table, this is cut into two rows based on the fact that the shift had changed at 2022年01月01日 00:30:00.
Is it possible to write a query that can do this? I would really appreciate it you have any ideas, thanks!
Link to db-fiddle: https://www.db-fiddle.com/f/sAXudT8qB35WWBk9xREj7p/6
3 Answers 3
Tested in dbfiddle.uk:
select distinct
sensor . sensor_id,
shift . shift,
shift . occurence_id,
status . status,
status . status_id,
MIN(st.datetime) OVER w as start_date,
MAX(st.datetime) OVER w as end_date,
FIRST_VALUE(st.reading) OVER w as reading_start,
LAST_VALUE(st.reading) OVER w as reading_end,
MIN(st.reading) OVER w as reading_min,
MAX(st.reading) OVER w as reading_max
from
status
join sensor
using (sensor_id)
join shift
on sensor.location_id = shift.location_id
and (status.start_date, status.end_date) OVERLAPS
(shift.start_date, shift.end_date)
join sensor_telemetry as st
on st.sensor_id = sensor.sensor_id
and tstzrange(
GREATEST(status.start_date, shift.start_date),
LEAST(status.end_date, shift.end_date)
) @> st.datetime
where
status.sensor_id = 5
window w as
( partition by
sensor . sensor_id,
shift . shift,
shift . occurence_id,
status . status,
status . status_id
order by
st.datetime
rows between unbounded preceding
and unbounded following
)
order by
start_date ;
And a bit simpler answer if we only require the MIN/MAX of the readings, not the first/last (or as the OP says we know that they always coincide): https://dbfiddle.uk/hbIcLpNn
Here is the solution I've arrived at by making use of the OVERLAPS() function, let me know if you can improve this!
Schema (PostgreSQL v11)
create table sensor_telemetry(
datetime TIMESTAMPTZ,
sensor_id int,
reading bigint
);
insert into sensor_telemetry (datetime, sensor_id, reading) values
('2020-01-01T00:00:00Z', 5, 1),
('2020-01-01T00:00:01Z', 5, 2),
('2020-01-01T00:00:02Z', 5, 3),
('2020-01-01T00:00:03Z', 5, 4),
('2020-01-01T00:00:04Z', 5, 5),
('2020-01-01T00:00:05Z', 5, 6),
('2020-01-01T00:00:06Z', 5, 7),
('2020-01-01T00:00:07Z', 5, 8),
('2020-01-01T00:00:08Z', 5, 9),
('2020-01-01T00:00:09Z', 5, 10),
('2020-01-01T00:00:10Z', 5, 11),
('2020-01-01T00:00:11Z', 5, 12),
('2020-01-01T00:00:12Z', 5, 13),
('2020-01-01T00:00:13Z', 5, 14),
('2020-01-01T00:00:14Z', 5, 15),
('2020-01-01T00:00:15Z', 5, 16),
('2020-01-01T00:00:16Z', 5, 17),
('2020-01-01T00:00:17Z', 5, 18),
('2020-01-01T00:00:18Z', 5, 19),
('2020-01-01T00:00:19Z', 5, 20),
('2020-01-01T00:00:20Z', 5, 21),
('2020-01-01T00:00:21Z', 5, 22),
('2020-01-01T00:00:22Z', 5, 23),
('2020-01-01T00:00:23Z', 5, 24),
('2020-01-01T00:00:24Z', 5, 25),
('2020-01-01T00:00:25Z', 5, 26),
('2020-01-01T00:00:26Z', 5, 27),
('2020-01-01T00:00:27Z', 5, 28),
('2020-01-01T00:00:28Z', 5, 29),
('2020-01-01T00:00:29Z', 5, 30),
('2020-01-01T00:00:30Z', 5, 31),
('2020-01-01T00:00:31Z', 5, 32),
('2020-01-01T00:00:32Z', 5, 33),
('2020-01-01T00:00:33Z', 5, 34),
('2020-01-01T00:00:34Z', 5, 35),
('2020-01-01T00:00:35Z', 5, 36),
('2020-01-01T00:00:36Z', 5, 37),
('2020-01-01T00:00:37Z', 5, 38),
('2020-01-01T00:00:38Z', 5, 39),
('2020-01-01T00:00:39Z', 5, 40),
('2020-01-01T00:00:40Z', 5, 41),
('2020-01-01T00:00:41Z', 5, 42),
('2020-01-01T00:00:42Z', 5, 43),
('2020-01-01T00:00:43Z', 5, 44),
('2020-01-01T00:00:44Z', 5, 45),
('2020-01-01T00:00:45Z', 5, 46),
('2020-01-01T00:00:46Z', 5, 47),
('2020-01-01T00:00:47Z', 5, 48),
('2020-01-01T00:00:48Z', 5, 49),
('2020-01-01T00:00:49Z', 5, 50),
('2020-01-01T00:00:50Z', 5, 51),
('2020-01-01T00:00:51Z', 5, 52),
('2020-01-01T00:00:52Z', 5, 53),
('2020-01-01T00:00:53Z', 5, 54),
('2020-01-01T00:00:54Z', 5, 55),
('2020-01-01T00:00:55Z', 5, 56),
('2020-01-01T00:00:56Z', 5, 57),
('2020-01-01T00:00:57Z', 5, 58),
('2020-01-01T00:00:58Z', 5, 59),
('2020-01-01T00:00:59Z', 5, 60),
('2020-01-01T00:01:00Z', 5, 61),
('2020-01-01T00:01:01Z', 5, 62),
('2020-01-01T00:01:02Z', 5, 63),
('2020-01-01T00:01:03Z', 5, 64),
('2020-01-01T00:01:04Z', 5, 65),
('2020-01-01T00:01:05Z', 5, 66),
('2020-01-01T00:01:06Z', 5, 67),
('2020-01-01T00:01:07Z', 5, 68),
('2020-01-01T00:01:08Z', 5, 69),
('2020-01-01T00:01:09Z', 5, 70),
('2020-01-01T00:01:10Z', 5, 71),
('2020-01-01T00:01:11Z', 5, 72),
('2020-01-01T00:01:12Z', 5, 73),
('2020-01-01T00:01:13Z', 5, 74),
('2020-01-01T00:01:14Z', 5, 75),
('2020-01-01T00:01:15Z', 5, 76),
('2020-01-01T00:01:16Z', 5, 77),
('2020-01-01T00:01:17Z', 5, 78),
('2020-01-01T00:01:18Z', 5, 79),
('2020-01-01T00:01:19Z', 5, 80),
('2020-01-01T00:01:20Z', 5, 81),
('2020-01-01T00:01:21Z', 5, 82),
('2020-01-01T00:01:22Z', 5, 83),
('2020-01-01T00:01:23Z', 5, 84),
('2020-01-01T00:01:24Z', 5, 85),
('2020-01-01T00:01:25Z', 5, 86),
('2020-01-01T00:01:26Z', 5, 87),
('2020-01-01T00:01:27Z', 5, 88),
('2020-01-01T00:01:28Z', 5, 89),
('2020-01-01T00:01:29Z', 5, 90),
('2020-01-01T00:01:30Z', 5, 91),
('2020-01-01T00:01:31Z', 5, 92),
('2020-01-01T00:01:32Z', 5, 93),
('2020-01-01T00:01:33Z', 5, 94),
('2020-01-01T00:01:34Z', 5, 95),
('2020-01-01T00:01:35Z', 5, 96),
('2020-01-01T00:01:36Z', 5, 97),
('2020-01-01T00:01:37Z', 5, 98),
('2020-01-01T00:01:38Z', 5, 99),
('2020-01-01T00:01:39Z', 5, 100),
('2020-01-01T00:01:40Z', 5, 101),
('2020-01-01T00:01:41Z', 5, 102),
('2020-01-01T00:01:42Z', 5, 103),
('2020-01-01T00:01:43Z', 5, 104),
('2020-01-01T00:01:44Z', 5, 105),
('2020-01-01T00:01:45Z', 5, 106),
('2020-01-01T00:01:46Z', 5, 107),
('2020-01-01T00:01:47Z', 5, 108),
('2020-01-01T00:01:48Z', 5, 109),
('2020-01-01T00:01:49Z', 5, 110),
('2020-01-01T00:01:50Z', 5, 111),
('2020-01-01T00:01:51Z', 5, 112),
('2020-01-01T00:01:52Z', 5, 113),
('2020-01-01T00:01:53Z', 5, 114),
('2020-01-01T00:01:54Z', 5, 115),
('2020-01-01T00:01:55Z', 5, 116),
('2020-01-01T00:01:56Z', 5, 117),
('2020-01-01T00:01:57Z', 5, 118),
('2020-01-01T00:01:58Z', 5, 119),
('2020-01-01T00:01:59Z', 5, 120),
('2020-01-01T00:02:00Z', 5, 121)
;
create table status(
sensor_id int,
status_id int,
start_date timestamptz,
end_date timestamptz,
status varchar(15)
);
insert into status (sensor_id, status_id, start_date, end_date, status) values
(5, 8, '2020-01-01T00:00:00Z', '2020-01-01T00:01:30Z', 'stationary'),
(5, 9, '2020-01-01T00:01:30Z', '2020-01-01T00:01:45Z', 'in_motion'),
(5, 10, '2020-01-01T00:01:45Z', '2020-01-01T00:02:00Z', 'stationary');
create table sensor(
sensor_id int,
location_id int
);
insert into sensor (sensor_id, location_id) values
(5, 16);
create table shift(
shift varchar(15),
location_id int,
occurence_id int,
start_date timestamptz,
end_date timestamptz
);
insert into shift (shift, location_id, occurence_id, start_date, end_date) values
('Shift A', 16, 123, '2020-01-01T00:00:00Z', '2020-01-01T00:01:00Z'),
('Shift B', 16, 124, '2020-01-01T00:01:00Z', '2020-01-01T00:02:00Z');
Query #1
with statuses as (
select
stat.sensor_id,
stat.status,
stat.status_id,
stat.start_date,
stat.end_date,
sens.location_id
from status stat
inner join sensor sens on (sens.sensor_id = stat.sensor_id)
where stat.sensor_id=5
),
shift_status as (
select
sa.sensor_id,
s.shift,
s.occurence_id,
s.start_date as shift_start,
s.end_date as shift_end,
sa.status,
sa.status_id,
sa.start_date as status_start,
sa.end_date as status_end
from shift s
inner join statuses sa on
(sa.location_id = s.location_id
and (sa.start_date, sa.end_date) OVERLAPS (s.start_date, s.end_date))
),
phases as (
select
sensor_id,
shift,
occurence_id,
status,
status_id,
GREATEST(shift_start, status_start) as start_date,
LEAST(shift_end, status_end) as end_date
from shift_status
order by start_date
)
select
st.sensor_id,
shift,
occurence_id,
status,
status_id,
MIN(datetime) as start_date,
MAX(datetime) as end_date,
MIN(reading) as reading_start,
MAX(reading) as reading_end
from sensor_telemetry st
inner join phases p on
(st.sensor_id = p.sensor_id
and (p.start_date, p.end_date) overlaps (st.datetime, st.datetime))
group by st.sensor_id, shift, occurence_id, status, status_id
order by start_date;
sensor_id | shift | occurence_id | status | status_id | start_date | end_date | reading_start | reading_end |
---|---|---|---|---|---|---|---|---|
5 | Shift A | 123 | stationary | 8 | 2020年01月01日T00:00:00.000Z | 2020年01月01日T00:00:59.000Z | 1 | 60 |
5 | Shift B | 124 | stationary | 8 | 2020年01月01日T00:01:00.000Z | 2020年01月01日T00:01:29.000Z | 61 | 90 |
5 | Shift B | 124 | in_motion | 9 | 2020年01月01日T00:01:30.000Z | 2020年01月01日T00:01:44.000Z | 91 | 105 |
5 | Shift B | 124 | stationary | 10 | 2020年01月01日T00:01:45.000Z | 2020年01月01日T00:01:59.000Z | 106 | 120 |
-
I don't think the
MAX
andMIN(reading)
give you the reading start and end, only the max and min in the duration of each group. Getting the first (start) and last (end) reading is going to be a bit more tricky.ypercubeᵀᴹ– ypercubeᵀᴹ2022年09月22日 00:33:29 +00:00Commented Sep 22, 2022 at 0:33 -
For me it's acceptable in this case - I know for sure that the reading can only increment in time, as it's not possible for it to decrement to a lower value. But you're right, the logic is not exactly correct in the general casekk_p– kk_p2022年09月22日 07:06:39 +00:00Commented Sep 22, 2022 at 7:06
I looked at this, quite thought-provoking +1!
To answer something like this, good data is essential (edge cases &c.), so I did the following, in particular simulating data using PostgreSQL's really powerful GENERATE_SERIES()
feature (see here, here and the manual).
All of the code below (and more) is available on the fiddles within the various answers' sections. I've put in INDEX
es and FOREIGN KEY
s because I'd like to test (both on db<>fiddle and at home) the performance of the different approaches - in particular to see how the PostgreSQL GIST index
(see here and the manual) works out.
1st answer - and general range based approach (fiddle):
I've added a location
table to the OP's ones - it makes the scenario more realistic (IMHO) and will come into play for testing on multiple locations.
CREATE TABLE location
(
location_id INTEGER NOT NULL PRIMARY KEY,
l_desc TEXT NOT NULL
);
Only 1 record for the moment:
INSERT INTO location VALUES (16, 'Farm');
Following the OP's setup for sensor
- names of fields have been changed slightly.
CREATE TABLE sensor
(
sensor_id SMALLINT NOT NULL PRIMARY KEY,
location_id SMALLINT NOT NULL,
CONSTRAINT loc_FK FOREIGN KEY (location_id) REFERENCES location (location_id)
);
CREATE INDEX sensor_l_ix ON sensor(location_id);
populate:
INSERT INTO sensor VALUES (5, 16);
We only need one, because the solution(s) (will) take care of the grouping by sensor and location.
CREATE TABLE telemetry
(
t_ts TIMESTAMPTZ NOT NULL,
sensor_id INTEGER NOT NULL,
t_val INTEGER NOT NULL,
CONSTRAINT tel_pk PRIMARY KEY (sensor_id, t_ts), -- can only have one (set of) measurements
-- at the same time - only t_val
CONSTRAINT tel_sens_fk FOREIGN KEY (sensor_id) REFERENCES sensor (sensor_id)
);
then:
INSERT INTO telemetry
SELECT
('2022-01-01 00:00:00'::TIMESTAMPTZ + (INTERVAL '1 SECOND' * (i - 1))), 5, i
FROM
(
SELECT
GENERATE_SERIES(1, 21700) AS i
) AS t;
I used PostgreSQL's GENERATE_SERIES()
function to simulate ~20k data points - time incrementing by 1 second, and the value by 1 - seems to be what the OP requires.
CREATE TABLE status
(
start_tz TIMESTAMPTZ NOT NULL,
end_tz TIMESTAMPTZ NOT NULL,
ms TEXT NOT NULL,
sensor_id SMALLINT NOT NULL
);
and:
INSERT INTO status
SELECT
GENERATE_SERIES('2022-01-01 00:00:00', '2022-01-01 06:20:00', INTERVAL '20 MINUTES'),
GENERATE_SERIES('2022-01-01 00:20:00', '2022-01-01 06:40:00', INTERVAL '20 MINUTES'),
CASE
WHEN random() > 0.5 THEN 'in_motion)'
ELSE 'stationary'
END,
5; -- <<== only one sensor - could have put in others using GENERATE_SERIES()
and now the shifts:
CREATE TABLE shift
(
shift_id TEXT NOT NULL,
location_id SMALLINT NOT NULL,
occurence_id SMALLINT NOT NULL,
start_tz TIMESTAMPTZ NOT NULL,
end_tz TIMESTAMPTZ NOT NULL
);
populate - shifts every thirty minutes - A
& B
.
WITH cte AS
(
SELECT
16 AS loc,
GENERATE_SERIES(1, 15, 1) AS occur,
GENERATE_SERIES('2022-01-01 00:00:00', '2022-01-01 07:00:00', INTERVAL '30 MINUTE')
AS start_tz,
GENERATE_SERIES('2022-01-01 00:30:00', '2022-01-01 07:30:00', INTERVAL '30 MINUTE')
AS end_tz
)
INSERT INTO shift
SELECT
CASE
WHEN EXTRACT (MINUTE FROM start_tz) = 0 THEN 'Shift A'
ELSE 'Shift B'
END,
loc,
occur,
start_tz,
end_tz
FROM cte;
Now, we use the ROW_NUMBER()
function (manual) and the %
modulo operator! I've included fields that were not strictly necessary in order to demonstrate my train of thought!
SELECT
ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id
ORDER BY st.start_tz, sh.start_tz) AS rn,
ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id
ORDER BY st.start_tz, sh.start_tz) % 4 AS mod,
CASE
ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id
ORDER BY st.start_tz, sh.start_tz) % 4
WHEN 1 THEN TSTZRANGE(st.start_tz, st.end_tz)
WHEN 2 THEN TSTZRANGE(st.start_tz, sh.end_tz) -- alternates - see TIMEs
WHEN 3 THEN TSTZRANGE(sh.start_tz, st.end_tz) -- alternates - see TIMEs
WHEN 0 THEN TSTZRANGE(st.start_tz, st.end_tz)
END AS range_date,
sh.occurence_id,
st.start_tz::TIME AS st_start, st.end_tz::TIME AS st_end, -- ::TIME for legibility
sh.start_tz::TIME AS sh_start, sh.end_tz::TIME AS sh_end
FROM sensor s
JOIN status st USING (sensor_id)
JOIN shift sh USING (location_id)
WHERE (st.start_tz, st.end_tz) OVERLAPS (sh.start_tz, sh.end_tz)
ORDER BY rn, mod, range_date;
Result (snipped for brevity):
rn mod range_date occurence_id st_start st_end sh_start sh_end
1 1 ["2022年01月01日 00:00:00+00","2022年01月01日 00:20:00+00") 1 00:00:00 00:20:00 00:00:00 00:30:00
2 2 ["2022年01月01日 00:20:00+00","2022年01月01日 00:30:00+00") 1 00:20:00 00:40:00 00:00:00 00:30:00
3 3 ["2022年01月01日 00:30:00+00","2022年01月01日 00:40:00+00") 2 00:20:00 00:40:00 00:30:00 01:00:00
4 0 ["2022年01月01日 00:40:00+00","2022年01月01日 01:00:00+00") 2 00:40:00 01:00:00 00:30:00 01:00:00
5 1 ["2022年01月01日 01:00:00+00","2022年01月01日 01:20:00+00") 3 01:00:00 01:20:00 01:00:00 01:30:00
Now, we can see that for every instance of the mod
field being = 1
we have the start of a new hour - so, tidying up and joining to the telemetry
table, the final query is:
SELECT
t.sensor_id AS s_id, sub.location_id AS loc_id,
sub.t_val AS "S_val", t.t_val AS "E_val",
sub.range_date, sub.occurence_id AS oc_id, sub.ms -- can have other fields...
FROM
telemetry t
JOIN
(
SELECT
t.sensor_id, t.t_val,
sh.occurence_id, sh.location_id, st.ms,
ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id
ORDER BY st.start_tz, sh.start_tz) % 4 AS rn,
CASE
ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id
ORDER BY st.start_tz, sh.start_tz) % 4
WHEN 1 THEN TSTZRANGE(st.start_tz, st.end_tz)
WHEN 2 THEN TSTZRANGE(st.start_tz, sh.end_tz)
WHEN 3 THEN TSTZRANGE(sh.start_tz, st.end_tz)
WHEN 0 THEN TSTZRANGE(st.start_tz, st.end_tz)
END AS range_date
FROM sensor s
JOIN status st USING (sensor_id)
JOIN shift sh USING (location_id)
JOIN telemetry t USING (sensor_id)
WHERE (st.start_tz, st.end_tz) OVERLAPS (sh.start_tz, sh.end_tz)
AND t.t_ts = CASE
WHEN sh.start_tz > st.start_tz THEN sh.start_tz
ELSE st.start_tz
END -- <<=== Note JOIN on CASE!
) AS sub
ON t.t_ts = UPPER(range_date) - INTERVAL '1 SECOND'
ORDER BY range_date;
Result (snipped for brevity):
s_id loc_id S_val E_val range_date oc_id ms
5 16 1 1200 ["2022年01月01日 00:00:00+00","2022年01月01日 00:20:00+00") 1 stationary
5 16 1201 1800 ["2022年01月01日 00:20:00+00","2022年01月01日 00:30:00+00") 1 in_motion)
5 16 1801 2400 ["2022年01月01日 00:30:00+00","2022年01月01日 00:40:00+00") 2 in_motion)
5 16 2401 3600 ["2022年01月01日 00:40:00+00","2022年01月01日 01:00:00+00") 2 stationary
5 16 3601 4800 ["2022年01月01日 01:00:00+00","2022年01月01日 01:20:00+00") 3 stationary
2nd answer - an exclusively range based approach (fiddle)
The setup is similar - except that I used the TSTZRANGE
types in my table declarations(*) and used GIST indexes to index those columns - check the fiddle.
(*) Tables status
and shift
The major differences between this answer and the 1st one is that the range syntax considerably enhances the legibilty of the final query and 2nd (to my mind anyway) makes the problem easier to reason about. The final query is as follows (see fiddle for setup and logic):
SELECT
t.sensor_id AS s_id, sub.location_id AS loc_id,
sub.t_val AS "S_val", t.t_val AS "E_val",
sub.range_date, sub.ms, sub.occurence_id AS oc_id -- can have other fields...
FROM
telemetry t
JOIN
(
SELECT
t.sensor_id, t.t_val,
sh.occurence_id, st.ms, sh.location_id,
CASE
ROW_NUMBER() OVER (PARTITION BY s.sensor_id, s.location_id ORDER BY st.datran, sh.datran) % 4
WHEN 1 THEN st.datran
WHEN 2 THEN TSTZRANGE(LOWER(st.datran), UPPER(sh.datran))
WHEN 3 THEN TSTZRANGE(LOWER(sh.datran), UPPER(st.datran))
WHEN 0 THEN st.datran
END AS range_date
FROM sensor s
JOIN status st USING (sensor_id)
JOIN shift sh USING (location_id)
JOIN telemetry t USING (sensor_id)
WHERE (st.datran) && (sh.datran) -- Note the && range type OVERLAPS operator!
AND t.t_ts = GREATEST(LOWER(st.datran), LOWER(sh.datran)) -- alternative to CASE
) AS sub
ON t.t_ts = UPPER(range_date) - INTERVAL '1 SECOND'
ORDER BY range_date;
Result - same as for the first query. If it wasn't, then it would be incorrect!
3rd answer - using EXTRACT (fiddle)
This solution is a bit weaker in the sense that it assumes knowledge of the distribution of the data points - i.e. on the hour, 20, 30, 40, next hour. But the OP told us that - and we can easily change this to suit any configuration.
WITH cte1 AS
(
SELECT
t1.t_ts, t1.t_val, t1.sensor_id,
ROW_NUMBER() OVER (PARTITION BY t1.sensor_id ORDER BY t1.t_ts) AS rn1
FROM
telemetry t1
WHERE
EXTRACT('MINUTE' FROM t1.t_ts) = 0 AND EXTRACT('SECOND' FROM t1.t_ts) = 0
OR EXTRACT('MINUTE' FROM t1.t_ts) = 20 AND EXTRACT('SECOND' FROM t1.t_ts) = 0
OR EXTRACT('MINUTE' FROM t1.t_ts) = 30 AND EXTRACT('SECOND' FROM t1.t_ts) = 0
OR EXTRACT('MINUTE' FROM t1.t_ts) = 40 AND EXTRACT('SECOND' FROM t1.t_ts) = 0
UNION ALL -- because there can't be any duplicates, so no need to sort and filter!
SELECT
t2.t_ts, t2.t_val, t2.sensor_id,
ROW_NUMBER() OVER (PARTITION BY t2.sensor_id ORDER BY t2.t_ts) AS rn2
FROM
telemetry t2
WHERE
EXTRACT('MINUTE' FROM t2.t_ts) = 19 AND EXTRACT('SECOND' FROM t2.t_ts) = 59
OR EXTRACT('MINUTE' FROM t2.t_ts) = 29 AND EXTRACT('SECOND' FROM t2.t_ts) = 59
OR EXTRACT('MINUTE' FROM t2.t_ts) = 39 AND EXTRACT('SECOND' FROM t2.t_ts) = 59
OR EXTRACT('MINUTE' FROM t2.t_ts) = 59 AND EXTRACT('SECOND' FROM t2.t_ts) = 59
ORDER BY t_ts
),
cte2 AS
(
SELECT
cx.rn1, cx.sensor_id AS sid, cx.t_ts AS t1, cy.t_ts AS t2, cx.t_val AS sv, cy.t_val AS ev
FROM
cte1 AS cx
JOIN cte1 AS cy
ON cx.rn1 = cy.rn1 AND cx.t_ts != cy.t_ts AND cx.t_val < cy.t_val
)
SELECT
st.sensor_id AS sid, sh.location_id AS lid, cte2.rn1, -- rn1 not strictly necessary - left in to demonstrate!
CASE rn1 % 2
WHEN 1 THEN sh.start_tz
ELSE st.start_tz
END AS t1,
CASE rn1 % 2
WHEN 1 THEN st.start_tz
ELSE sh.end_tz
END AS t2,
sh.occurence_id AS oc_id, st.ms,
sh.shift_id AS sh_id
FROM
shift sh
JOIN status st
ON (sh.start_tz, sh.end_tz) OVERLAPS (st.start_tz, sh.end_tz)
JOIN cte2
ON st.sensor_id = cte2.sid
WHERE sh.start_tz <= st.start_tz AND sh.end_tz <= st.end_tz
AND (cte2.t1, cte2.t2) OVERLAPS (sh.start_tz, sh.end_tz)
ORDER BY sh.start_tz, sh.end_tz;
Result - same as for the first query. If it wasn't, then it would be incorrect!
Preliminary Performance Analysis: (TBD)
The usual caveats apply! It's impossible to get reliable benchmark figures from
a system where I don't have a clue what's running at the same time - this is all the more true for a server that's exposed to the internet!
a system which is configured with a very small number of records relatively speaking - I also did this:
SET enable_seqscan = off; SET enable_nestloop = off;
It remains to be seen (coming to a screen near you soon!) what happens when I perform these tests with a large number of records on a beefy home server with nothing else happening and without the "tricks" above.
Explore related questions
See similar questions with these tags.
location_id
in the telemetry table!