0

I've been looking for similar questions and solutions but have'nt found any suitable, as i try to avoid full table searches and so on. For better readability example tables and data are simplified. Actual table has over 1M records and contains additional columns (thus 'etc...' in example).

So this simplified table definition goes like that:

CREATE TABLE IF NOT EXISTS example (
 sensor varchar(10) not null,
 date_col date not null,
 temperature decimal(4) not null,
 UNIQUE (sensor,date_col)
 );

And example data too:

INSERT INTO example VALUES
('A1', '2023-12-29', '-18'), 
('B2', '2023-12-20', '-15'),
('A1', '2024-01-04', '-10'),
('C1', '2024-01-08', '3'),
('B2', '2023-12-23', '-11'),
('A1', '2024-01-06', '3'),
('C1', '2024-01-19', '-2'),
('C1', '2024-01-20', '1'),
('A1', '2024-01-05', '7'),
('B2', '2024-01-05', '3');

So the result will be something like that:

sensor | date_col | temperature | etc... |
-------+-------------+-------------+-----------+
 A1 | 2023年12月29日 | -18 | A1_etc... |
 B2 | 2023年12月20日 | -15 | B2_etc... |
 A1 | 2024年01月04日 | -10 | A1_...
 C1 | 2024年01月08日 | 3 |
 B2 | 2023年12月23日 | -11
 A1 | 2024年01月06日 | 3
 C1 | 2024年01月19日 | -2
 C1 | 2024年01月20日 | 1
 A1 | 2024年01月05日 | 7
 B2 | 2024年01月05日 | 3

Now what would like to do:

  1. find latest (date and) temperature record for every sensor (ex: A1, 2024年01月06日, 3 )
  2. find second to latest (date and) temperature record for every sensor (ex: A1, 2024年01月05日, 7)
  3. calculate seonsor temperature change = latest_temp_value - prev_temp_value ( 3 - 7 = -4 )
  4. print out every sensor, latest date, temperature and temperature change (compared with previous temperature/record)

So the goal is (削除) like (削除ここまで) that:

sensor | date_col | temperature | change |
-------+------------+-------------+--------+
 A1 | 2024年01月06日 | 3 | -4 |
 B2 | 2024年01月05日 | 3 | 14 |
 C1 | 2024年01月20日 | 1 | 3 |

PG window functions (LAG,LEAD) have been dead end for me or i'm just not clever enough to use them properly. Same goes for "MAX(date_col)...", "DISTINCT ON ...". And "..GROUP BY .." and/or "...LIMIT n" which are leading to full table scans. Let's say that custom functions, subselects and joins's are OK. Using views and full table scans are not.

Can this result somehow be achieved with one SQL satement?

Thank You!

asked Jan 25, 2024 at 15:44
2
  • So the goal is like that Forget about "like"! Shown output must completely match shown source data. And add explanation to each output value. Commented Jan 25, 2024 at 18:03
  • SELECT ... , temperature - LAG(temperature) OVER (PARTITION BY sensor ORDER BY date_col) AS change FROM ... Commented Jan 25, 2024 at 18:05

2 Answers 2

0

Create test data with 1M rows:

CREATE UNLOGGED TABLE sensors( sensor_id INTEGER PRIMARY KEY );
INSERT INTO sensors SELECT generate_series(1,1000);
CREATE UNLOGGED TABLE measurements (
 sensor_id INTEGER NOT NULL,
 date date not null,
 temp float not null,
 UNIQUE (sensor_id,date)
 );
INSERT INTO measurements SELECT sensor_id, '2000-01-01'::DATE + '1 DAY'::INTERVAL*d, random()
FROM sensors CROSS JOIN generate_series(1,1000) d;
VACUUM ANALYZE;

Get last 2 rows for each sensor:

SELECT * FROM
sensors s
JOIN LATERAL (
 SELECT * FROM measurements m
 WHERE m.sensor_id=s.sensor_id
 ORDER BY date DESC LIMIT 2
 ) USING (sensor_id);
Nested Loop (cost=0.42..7869.69 rows=10 width=16) (actual time=0.074..8.512 rows=2000 loops=1)
 -> Seq Scan on sensors s (cost=0.00..15.00 rows=1000 width=4) (actual time=0.020..0.122 rows=1000 loops=1)
 -> Subquery Scan on unnamed_subquery (cost=0.42..7.84 rows=1 width=16) (actual time=0.007..0.008 rows=2 loops=1000)
 Filter: (s.sensor_id = unnamed_subquery.sensor_id)
 -> Limit (cost=0.42..7.82 rows=2 width=16) (actual time=0.007..0.007 rows=2 loops=1000)
 -> Index Scan Backward using measurements_sensor_id_date_key on measurements m (cost=0.42..3697.77 rows=1000 width=16) (actual time=0.007..0.007 rows=2 loops=1000)
 Index Cond: (sensor_id = s.sensor_id)
 Execution Time: 8.664 ms
 sensor_id | date | temp
-----------+------------+-----------------------
 1 | 2002年09月27日 | 0.21334769370357276
 1 | 2002年09月26日 | 0.5202488410925379
 2 | 2002年09月27日 | 0.3530518649150136
 2 | 2002年09月26日 | 0.8535599911382779
 3 | 2002年09月27日 | 0.869634043585521
...

The key here is that window functions like LAG() will not be optimized in the way you want. However, using a lateral join we can retrieve the last two rows from each sensor using a fast plan.

Lateral join turns the joined table into a dependent subquery, which is executed for each value of sensor_id. This allows using "ORDER BY date DESC LIMIT 2", which can be done very quickly using the index on (sensor_id,date).

Query results contain all the information needed to compute the difference in temperature between the last two days. Presenting the results as shown in the question is cosmetic, so it should be done in the application code.

If you want to do it in the database, you can put the above query in a CTE, then join it to itself. The join condition is simple because the CTE contains at most two rows and date is unique per sensor. Note the LEFT JOIN will still return a row even if the sensor has only one measurement.

WITH foo AS (
 SELECT * FROM
 sensors s
 JOIN LATERAL (
 SELECT * FROM measurements m
 WHERE m.sensor_id=s.sensor_id
 ORDER BY date DESC LIMIT 2
 ) USING (sensor_id)
)
SELECT *
FROM foo yesterday LEFT JOIN foo today 
 ON (yesterday.date<today.date AND yesterday.sensor_id=today.sensor_id)
answered Jan 25, 2024 at 22:43
1
  • Thank You for long and clear explanation! Commented Jan 26, 2024 at 7:06
0

Thanks to bobflux answer i pushed my own ideas a bit further. So i decided to create a function, which returns just temperature change for specific sensor.

CREATE function f_temp_change(ts varchar)
 RETURNS TABLE (temp_change DECIMAL) AS $$
 WITH x AS (
 SELECT date_col,temperature,round((temperature - LEAD(temperature,1,temperature) 
 OVER (ORDER BY date_col DESC )),2) AS temp_change 
 FROM example WHERE sensor=ts ORDER BY date_col DESC LIMIT 2
 )
 SELECT temp_change FROM x ORDER BY date_col DESC limit 1;
$$ LANGUAGE SQL;

This gives me:

select f_temp_change('A1');
 f_temp_change 
---------------
 -4.00

So i can execute query for one sensor and get the result:

SELECT *,f_temp_change(example.sensor) AS change FROM example 
 WHERE sensor='A1' 
 AND date_col=(SELECT MAX(date_col) AS date_col FROM example WHERE sensor='A1');
 sensor | date_col | temperature | change 
--------+------------+-------------+--------
 A1 | 2024年01月06日 | 3 | -4.00

And now for all sensors:

SELECT DISTINCT(s.sensor),x.date_col,x.temperature,change FROM example AS s,
LATERAL ( 
 SELECT *,f_temp_change(sensor) AS change FROM example 
 WHERE sensor=s.sensor 
 AND date_col=(SELECT MAX(date_col) AS date_col 
 FROM example WHERE sensor=s.sensor) 
 ) AS x 
ORDER BY sensor;
 sensor | date_col | temperature | change 
--------+------------+-------------+--------
 A1 | 2024年01月06日 | 3 | -4.00
 B2 | 2024年01月05日 | 3 | 14.00
 C1 | 2024年01月20日 | 1 | 3.00

This is the result i can live with at least for now.

answered Jan 26, 2024 at 10:56

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.