I've been looking for similar questions and solutions but have'nt found any suitable, as i try to avoid full table searches and so on. For better readability example tables and data are simplified. Actual table has over 1M records and contains additional columns (thus 'etc...' in example).
So this simplified table definition goes like that:
CREATE TABLE IF NOT EXISTS example (
sensor varchar(10) not null,
date_col date not null,
temperature decimal(4) not null,
UNIQUE (sensor,date_col)
);
And example data too:
INSERT INTO example VALUES
('A1', '2023-12-29', '-18'),
('B2', '2023-12-20', '-15'),
('A1', '2024-01-04', '-10'),
('C1', '2024-01-08', '3'),
('B2', '2023-12-23', '-11'),
('A1', '2024-01-06', '3'),
('C1', '2024-01-19', '-2'),
('C1', '2024-01-20', '1'),
('A1', '2024-01-05', '7'),
('B2', '2024-01-05', '3');
So the result will be something like that:
sensor | date_col | temperature | etc... |
-------+-------------+-------------+-----------+
A1 | 2023年12月29日 | -18 | A1_etc... |
B2 | 2023年12月20日 | -15 | B2_etc... |
A1 | 2024年01月04日 | -10 | A1_...
C1 | 2024年01月08日 | 3 |
B2 | 2023年12月23日 | -11
A1 | 2024年01月06日 | 3
C1 | 2024年01月19日 | -2
C1 | 2024年01月20日 | 1
A1 | 2024年01月05日 | 7
B2 | 2024年01月05日 | 3
Now what would like to do:
- find latest (date and) temperature record for every sensor (ex: A1, 2024年01月06日, 3 )
- find second to latest (date and) temperature record for every sensor (ex: A1, 2024年01月05日, 7)
- calculate seonsor temperature change = latest_temp_value - prev_temp_value ( 3 - 7 = -4 )
- print out every sensor, latest date, temperature and temperature change (compared with previous temperature/record)
So the goal is (削除) like (削除ここまで) that:
sensor | date_col | temperature | change |
-------+------------+-------------+--------+
A1 | 2024年01月06日 | 3 | -4 |
B2 | 2024年01月05日 | 3 | 14 |
C1 | 2024年01月20日 | 1 | 3 |
PG window functions (LAG,LEAD) have been dead end for me or i'm just not clever enough to use them properly. Same goes for "MAX(date_col)...", "DISTINCT ON ...". And "..GROUP BY .." and/or "...LIMIT n" which are leading to full table scans. Let's say that custom functions, subselects and joins's are OK. Using views and full table scans are not.
Can this result somehow be achieved with one SQL satement?
Thank You!
2 Answers 2
Create test data with 1M rows:
CREATE UNLOGGED TABLE sensors( sensor_id INTEGER PRIMARY KEY );
INSERT INTO sensors SELECT generate_series(1,1000);
CREATE UNLOGGED TABLE measurements (
sensor_id INTEGER NOT NULL,
date date not null,
temp float not null,
UNIQUE (sensor_id,date)
);
INSERT INTO measurements SELECT sensor_id, '2000-01-01'::DATE + '1 DAY'::INTERVAL*d, random()
FROM sensors CROSS JOIN generate_series(1,1000) d;
VACUUM ANALYZE;
Get last 2 rows for each sensor:
SELECT * FROM
sensors s
JOIN LATERAL (
SELECT * FROM measurements m
WHERE m.sensor_id=s.sensor_id
ORDER BY date DESC LIMIT 2
) USING (sensor_id);
Nested Loop (cost=0.42..7869.69 rows=10 width=16) (actual time=0.074..8.512 rows=2000 loops=1)
-> Seq Scan on sensors s (cost=0.00..15.00 rows=1000 width=4) (actual time=0.020..0.122 rows=1000 loops=1)
-> Subquery Scan on unnamed_subquery (cost=0.42..7.84 rows=1 width=16) (actual time=0.007..0.008 rows=2 loops=1000)
Filter: (s.sensor_id = unnamed_subquery.sensor_id)
-> Limit (cost=0.42..7.82 rows=2 width=16) (actual time=0.007..0.007 rows=2 loops=1000)
-> Index Scan Backward using measurements_sensor_id_date_key on measurements m (cost=0.42..3697.77 rows=1000 width=16) (actual time=0.007..0.007 rows=2 loops=1000)
Index Cond: (sensor_id = s.sensor_id)
Execution Time: 8.664 ms
sensor_id | date | temp
-----------+------------+-----------------------
1 | 2002年09月27日 | 0.21334769370357276
1 | 2002年09月26日 | 0.5202488410925379
2 | 2002年09月27日 | 0.3530518649150136
2 | 2002年09月26日 | 0.8535599911382779
3 | 2002年09月27日 | 0.869634043585521
...
The key here is that window functions like LAG() will not be optimized in the way you want. However, using a lateral join we can retrieve the last two rows from each sensor using a fast plan.
Lateral join turns the joined table into a dependent subquery, which is executed for each value of sensor_id. This allows using "ORDER BY date DESC LIMIT 2", which can be done very quickly using the index on (sensor_id,date).
Query results contain all the information needed to compute the difference in temperature between the last two days. Presenting the results as shown in the question is cosmetic, so it should be done in the application code.
If you want to do it in the database, you can put the above query in a CTE, then join it to itself. The join condition is simple because the CTE contains at most two rows and date is unique per sensor. Note the LEFT JOIN will still return a row even if the sensor has only one measurement.
WITH foo AS (
SELECT * FROM
sensors s
JOIN LATERAL (
SELECT * FROM measurements m
WHERE m.sensor_id=s.sensor_id
ORDER BY date DESC LIMIT 2
) USING (sensor_id)
)
SELECT *
FROM foo yesterday LEFT JOIN foo today
ON (yesterday.date<today.date AND yesterday.sensor_id=today.sensor_id)
-
Thank You for long and clear explanation!pisikesipelgas– pisikesipelgas2024年01月26日 07:06:10 +00:00Commented Jan 26, 2024 at 7:06
Thanks to bobflux answer i pushed my own ideas a bit further. So i decided to create a function, which returns just temperature change for specific sensor.
CREATE function f_temp_change(ts varchar)
RETURNS TABLE (temp_change DECIMAL) AS $$
WITH x AS (
SELECT date_col,temperature,round((temperature - LEAD(temperature,1,temperature)
OVER (ORDER BY date_col DESC )),2) AS temp_change
FROM example WHERE sensor=ts ORDER BY date_col DESC LIMIT 2
)
SELECT temp_change FROM x ORDER BY date_col DESC limit 1;
$$ LANGUAGE SQL;
This gives me:
select f_temp_change('A1');
f_temp_change
---------------
-4.00
So i can execute query for one sensor and get the result:
SELECT *,f_temp_change(example.sensor) AS change FROM example
WHERE sensor='A1'
AND date_col=(SELECT MAX(date_col) AS date_col FROM example WHERE sensor='A1');
sensor | date_col | temperature | change
--------+------------+-------------+--------
A1 | 2024年01月06日 | 3 | -4.00
And now for all sensors:
SELECT DISTINCT(s.sensor),x.date_col,x.temperature,change FROM example AS s,
LATERAL (
SELECT *,f_temp_change(sensor) AS change FROM example
WHERE sensor=s.sensor
AND date_col=(SELECT MAX(date_col) AS date_col
FROM example WHERE sensor=s.sensor)
) AS x
ORDER BY sensor;
sensor | date_col | temperature | change
--------+------------+-------------+--------
A1 | 2024年01月06日 | 3 | -4.00
B2 | 2024年01月05日 | 3 | 14.00
C1 | 2024年01月20日 | 1 | 3.00
This is the result i can live with at least for now.
SELECT ... , temperature - LAG(temperature) OVER (PARTITION BY sensor ORDER BY date_col) AS change FROM ...