i'm doing a select starting from a table named events
that is joined with device measurements using a date between function (this time between is a sort of "mobile windowing") in order to get the measurements that are included between the event log insert time and a -1 hour inverval.
After that, due to the fact that the measurement is a JSON structure, i use a cross join to unpack the ap_info
object and then get the ap_name
, ap_ip
and neigh_name
of the latest measurement that is present on db right before the event occours.
My two attempts:
First approach (very slow, also and unusable when there is no match on the concat
filter expression):
Select
p.neigh_name
from tv_smartdevicemeasurement_snmp
cross join jsonb_to_recordset(tv_smartdevicemeasurement_snmp.data->'ap_info') as p(ap_name text,ap_ip text, neigh_name text)
inner join tv_event on tv_event.name = p.ap_name
where smart_device_id = 3 and (tv_smartdevicemeasurement_snmp.insert_time <= tv_event.insert_time
and tv_smartdevicemeasurement_snmp.insert_time > tv_event.insert_time - '1 hour'::interval)
and is_event_ack = false
and (CONCAT('Host: ', p.ap_name, ' - IP: ' , p.ap_ip) = 'Host: AP-04 - IP: 10.50.2.130')
order by tv_smartdevicemeasurement_snmp.insert_time desc
limit 1
Explain: HERE
CTE Method (goes without any evident issue):
with cte_temp as (select * from (
select tv_event.insert_time, tv_smartdevicemeasurement_snmp.data from tv_event
inner join tv_smartdevicemeasurement_snmp on
(tv_smartdevicemeasurement_snmp.insert_time <= tv_event.insert_time and tv_smartdevicemeasurement_snmp.insert_time > tv_event.insert_time - '1 hour'::interval)
where is_event_ack = false and tv_smartdevicemeasurement_snmp.smart_device_id = 3) as join_non_ack_evt_meas
) Select insert_time, ap_name, ap_ip, neigh_name from cte_temp
cross join jsonb_to_recordset(cte_temp.data->'ap_info') as p(ap_name text,ap_ip text, neigh_name text)
where (CONCAT('Host: ', ap_name, ' - IP: ' , ap_ip) = 'Host: AP-04 - IP: 10.50.2.130')
order by insert_time desc
limit 1
Explain: HERE
The question is: why there is so much difference between these two queries? i mean, they works on the same tables! please explain me in which differs: the only things that i'm seeing is that this approach is working on a large row subset
And, in addition, you have any suggest to optimize my second approach?
Postgres Version: "PostgreSQL 11.13 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20210424) 10.3.1 20210424, 64-bit"
1 Answer 1
CTEs are always materialized in Posgres 11 or older
Up to (and including) Postgres 11, CTEs are always materialized. So it always acts as an optimization barrier and forces separate query plans for the CTE and the rest of the query. This can be beneficial for performance or not, depending on a number of circumstances.
Follow the link in this item of the Postgres 12 release notes:
Automatic (but overridable) inlining of common table expressions (CTEs)
And you'll understand why the same comparison will likely yield different results in a current version of Postgres (12+).
Better query
The formulation of your core predicate is terrible for performance. It's not "sargable" (and generally wastes computing cycles):
where (CONCAT('Host: ', ap_name, ' - IP: ' , ap_ip) = 'Host: AP-04 - IP: 10.50.2.130')
In a fist step, unravel to:
WHERE ap_name = 'AP-04'
AND ap_ip = '10.50.2.130'
Since those are nested jsonb attributes, reformulate like this:
SELECT p.neigh_name
FROM tv_smartdevicemeasurement_snmp s
JOIN jsonb_to_recordset(s.data -> 'ap_info')
AS p(ap_name text, ap_ip text, neigh_name text)
ON p.ap_name = 'AP-04' -- !
AND p.ap_ip = '10.50.2.130' -- !
JOIN tv_event e ON e.name = p.ap_name
WHERE smart_device_id = 3 -- s. or e. ?
AND s.insert_time <= e.insert_time
AND s.insert_time > e.insert_time - '1 hour'::interval
AND is_event_ack = FALSE -- s. or e. ?
AND (s.data -> 'ap_info') @> '[{"ap_name": "AP-05"
, "ap_ip": "10.50.2.131"}]' -- !
ORDER BY s.insert_time DESC
LIMIT 1
And support it with an applicable index like:
CREATE INDEX jsb_foo ON jsb USING GIN ((data -> 'ap_info') jsonb_path_ops);
See:
- Index by subkey inside subkey inside array in jsonb
- How to get particular object from jsonb array in PostgreSQL?
This assumes that only a small percentage of all rows has data for the given name and IP. (Else, the relational design is even more inefficient than what's evident so far.)
Might be optimized further, depending on missing information.
It's typically more efficient to store attributes that regularly serve as filter in normalized form in their natural data type. inet
or cidr
for ap_ip
- or even ip4
from the ip4r extension.
-
ap_name and ap_ip were taken from a jsonb_to_recordset over the cross join: they are stored in a JSON field.VirtApp– VirtApp2021年12月16日 13:00:50 +00:00Commented Dec 16, 2021 at 13:00
-
@VirtApp: OK, the origin of those columns was my oversight. The advice still mostly applies. If you provide information as instructed here, most prominently relevant index and table definitions (in the question!), I might be able to suggest a much more efficient query.Erwin Brandstetter– Erwin Brandstetter2021年12月17日 17:15:40 +00:00Commented Dec 17, 2021 at 17:15
Explore related questions
See similar questions with these tags.
SELECT version();
(I have posted like a thousand variations of this comment over the years, if you wonder about the punctuation.)