I try to create a PostGIS view in which I want the number of features having an attribute date < 3 years from current date, and the total number of features, to be able to display a pie chart in QGIS. The features needs to be grouped by town (commune in French) and sometimes by a smaller town division called "tournee" in this case.
I have a materialized view which contains the towns and tournees, with the geometry.
The features I need to count are in two different tables.
My first try is working but is not efficient when I display the view in QGIS.
The view is created with the following SQL :
CREATE OR REPLACE VIEW deci.v_pei_avancee_rop_communes_tournees
AS
-- Get feature from the two soruce table (features to count)
WITH t1 AS (
SELECT pei_raccorde.code_insee,
pei_raccorde.commune,
pei_raccorde.numero_tournee,
pei_raccorde.date_suppression,
pei_raccorde.date_reco_ops
FROM deci.pei_raccorde
UNION ALL
SELECT pei_na_non_raccorde.code_insee,
pei_na_non_raccorde.commune,
pei_na_non_raccorde.numero_tournee,
pei_na_non_raccorde.date_suppression,
pei_na_non_raccorde.date_reco_ops
FROM deci.pei_na_non_raccorde
ORDER BY code_insee, numero_tournee
)
-- Create view attributes
SELECT
-- view id
row_number() OVER () AS id,
-- town/subdivision name
CASE
-- if town does not have a subdivision
WHEN t1.numero_tournee IS NULL
-- name is the town name
THEN t1.commune::text
-- else concat town name and subdivision name
ELSE concat(t1.commune, '-Tournée ', t1.numero_tournee)
END AS commune_tournee,
-- count the features when date_reco_ops is < 3 years before current year
count(
CASE
WHEN (date_part('year'::text, now()) - date_part('year'::text, t1.date_reco_ops)) < 3::double precision THEN 1
ELSE NULL::integer
END) AS nb_controles
-- count all features
count(*) AS nb_pei,
-- geometry : get from materialized view ct
ct.geom
FROM t1
-- join with materialized view v_communes_tournees on fields code_insee and numero_tournee
JOIN deci.v_communes_tournees ct ON ct.code_insee::text = t1.code_insee::text AND NOT ct.numero_tournee::text IS DISTINCT FROM t1.numero_tournee::text
-- get only features where date_suppression is not filled (=null)
WHERE t1.date_suppression IS NULL
GROUP BY t1.code_insee, t1.commune, t1.numero_tournee, ct.geom
ORDER BY t1.code_insee, t1.numero_tournee;
I have the following indexes :
- geom (gist) on the materialized view
- code_insee and numero_tournee (btree) on the materialized view and the 2 source tables
In QGIS the materialized view is fast to display. The deci.v_pei_avancee_rop_communes_tournees view is slow to display (even withou the pie chart).
Add JSON QUERY PLAN :
[
{
"Plan": {
"Node Type": "WindowAgg",
"Parallel Aware": false,
"Actual Rows": 389,
"Actual Loops": 1,
"Plans": [
{
"Node Type": "Aggregate",
"Strategy": "Sorted",
"Partial Mode": "Simple",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Actual Rows": 389,
"Actual Loops": 1,
"Group Key": [
"pei_raccorde.code_insee",
"pei_raccorde.numero_tournee",
"pei_raccorde.commune",
"ct.geom"
],
"Plans": [
{
"Node Type": "Incremental Sort",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Actual Rows": 9610,
"Actual Loops": 1,
"Sort Key": [
"pei_raccorde.code_insee",
"pei_raccorde.numero_tournee",
"pei_raccorde.commune",
"ct.geom"
],
"Presorted Key": [
"pei_raccorde.code_insee"
],
"Full-sort Groups": {
"Group Count": 109,
"Sort Methods Used": [
"quicksort"
],
"Sort Space Memory": {
"Average Sort Space Used": 517,
"Peak Sort Space Used": 522
}
},
"Pre-sorted Groups": {
"Group Count": 111,
"Sort Methods Used": [
"quicksort"
],
"Sort Space Memory": {
"Average Sort Space Used": 1375,
"Peak Sort Space Used": 1560
}
},
"Plans": [
{
"Node Type": "Merge Join",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Join Type": "Inner",
"Actual Rows": 9610,
"Actual Loops": 1,
"Inner Unique": false,
"Merge Cond": "((ct.code_insee)::text = (pei_raccorde.code_insee)::text)",
"Join Filter": "(NOT ((ct.numero_tournee)::text IS DISTINCT FROM (pei_raccorde.numero_tournee)::text))",
"Rows Removed by Join Filter": 34024,
"Plans": [
{
"Node Type": "Sort",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Actual Rows": 426,
"Actual Loops": 1,
"Sort Key": [
"ct.code_insee"
],
"Sort Method": "quicksort",
"Sort Space Used": 2132,
"Sort Space Type": "Memory",
"Plans": [
{
"Node Type": "Seq Scan",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Relation Name": "v_communes_tournees",
"Alias": "ct",
"Actual Rows": 426,
"Actual Loops": 1
}
]
},
{
"Node Type": "Materialize",
"Parent Relationship": "Inner",
"Parallel Aware": false,
"Actual Rows": 43635,
"Actual Loops": 1,
"Plans": [
{
"Node Type": "Sort",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Actual Rows": 9611,
"Actual Loops": 1,
"Sort Key": [
"pei_raccorde.code_insee",
"pei_raccorde.numero_tournee"
],
"Sort Method": "quicksort",
"Sort Space Used": 1099,
"Sort Space Type": "Memory",
"Plans": [
{
"Node Type": "Append",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Actual Rows": 9625,
"Actual Loops": 1,
"Subplans Removed": 0,
"Plans": [
{
"Node Type": "Seq Scan",
"Parent Relationship": "Member",
"Parallel Aware": false,
"Relation Name": "pei_raccorde",
"Alias": "pei_raccorde",
"Actual Rows": 9569,
"Actual Loops": 1,
"Filter": "(date_suppression IS NULL)",
"Rows Removed by Filter": 78
},
{
"Node Type": "Seq Scan",
"Parent Relationship": "Member",
"Parallel Aware": false,
"Relation Name": "pei_na_non_raccorde",
"Alias": "pei_na_non_raccorde",
"Actual Rows": 56,
"Actual Loops": 1,
"Filter": "(date_suppression IS NULL)",
"Rows Removed by Filter": 0
}
]
}
]
}
]
}
]
}
]
}
]
}
]
},
"Triggers": []
}
]
2 Answers 2
I finally rebuild my query from scratch.
CREATE OR REPLACE VIEW deci.v_pei_avancee_rop_communes_tournees
AS
-- CTE t1 : get all features to count
with t1 as
(select code_insee, commune, numero_tournee, date_suppression, date_reco_ops from deci.pei_raccorde
union all
select code_insee, commune, numero_tournee, date_suppression, date_reco_ops from deci.pei_na_non_raccorde),
-- CTE t2 : count the features when date_reco_ops is < 3 years before current year + count the total number of features, grouped by town and sudvision when there are ones
t2 as
(select code_insee,
commune,
numero_tournee,
-- count according to date
count((CASE
WHEN (date_part('year', now()) - date_part('year', date_reco_ops)) < 3 THEN 1
ELSE NULL
END)) AS nb_controles,
-- total feature count
count(*) as nb_pei
from t1
-- filters for features to ignore
where date_suppression is null and code_insee <> 'tempo'
-- group
group by code_insee, commune, numero_tournee
order by code_insee, numero_tournee)
-- build view with id, town id/subdivision/town name
-- + get geom from other table (was a materialized view in my previous post but I changed it to a table)
select row_number() over() as id,
ct.code_insee,
ct.commune,
ct.numero_tournee,
-- default value 0 when a town has no features joined
coalesce(t2.nb_controles, 0) as nb_controles,
-- default value 0 when a town has no features joined
coalesce(t2.nb_pei, 0) as nb_pei,
nb_pei-nb_controles as nb_a_controler,
-- geometry get from table communes_tournees
ct.geom
from deci.communes_tournees ct
-- left join because I want to keep towns with no features
left join t2
-- join on code_insee et numero_tournee which have a composite index in all 3 tables : communes_tournees, pei_raccorde and pei_na_non_raccorde
on (ct.code_insee = t2.code_insee
and NOT ct.numero_tournee IS DISTINCT FROM t2.numero_tournee);
And I created composite indexes, for example as following :
CREATE INDEX idx_pei_raccorde_insee_tournee
ON deci.pei_raccorde USING btree
(code_insee, numero_tournee)
;
The display in QGIS is fast and I've been able to create my symbology now.
You just need to refactor; run something like this:
SELECT
ROW_NUMBER() OVER(
ORDER BY code_insee, commune, numero_tournee
) AS id,
COALESCE(commune || '-Tournée ' || numero_tournee, commune) AS commune_tournee
FROM
(
SELECT
code_insee,
commune,
numero_tournee,
COUNT(*) FILTER(
WHERE date_reco_ops >= NOW() - INTERVAL '3 YEARS'
) AS nb_controles,
COUNT(*) AS nb_pei
FROM
(
SELECT * FROM
deci.pei_raccorde
WHERE
date_suppression IS NULL
UNION ALL
SELECT * FROM
deci.pei_na_non_raccorde
WHERE
date_suppression IS NULL
) q
GROUP BY
1, 2, 3
) AS mv
JOIN
deci.v_communes_tournees AS gv
ON
mv.code_insee = gv.code_insee
AND
mv.numero_tournee = gv.numero_tournee
;
Notes:
- it feels like those two feature tables should undergo normalization
- merge into one with an attribute denoting the actual difference
- have a composite key for
code_insee, numero_tournee
- to boost the
JOIN
to yourMATERIALIZED VIEW
, you want to either- have a composite
INDEX
on(code_insee, numero_tournee)
- use a related composite key (as above) between both tables
- have a composite
AND NOT ... IS DISTINCT FROM ...
is not distinct from=
and does not trigger an index lookup- if there is an index on
date_reco_ops
, you need to trigger it correctly; extracting parts of it will deny the index - having
NULL
s asJOIN
condition (as with yournumero_tournee
column) is an anti-pattern: consider to add an actual value instead
-
thanks for your precise answer. - Regarding the two tables, I have no possibility to change them, it is like this on purpose :( - I did create composite indexes (see my own answer to this post) - Regarding the date, I don't need the whole date to be less than 3 years but the plain year. So the interval you propose is not ok for my need - I used is distinct from to be able to join wether the numero_tournee is null or filled. It seems to work, whereas = is not working. So I'm surprised with your remark - regarding nulls in numero_tournee, which actual value do you mean ?Jérémie– Jérémie2022年04月08日 11:49:58 +00:00Commented Apr 8, 2022 at 11:49
-
That is exactly the issue, and actually I should have noted that when stating to use
=
:NULL
s are no values, they describe the absence of values, and thus won't get indexed and are not comparable and have all sorts of no good consequences when present in dimension columns. You are depending on that column as dimension column in a multitude of ways, and thus you really should come up with an actual null value (e.g.0
,<null>
,-1.0
, something you won't use otherwise) rather than usingNULL
!geozelot– geozelot2022年04月08日 13:05:43 +00:00Commented Apr 8, 2022 at 13:05 -
I'm not sure to understand everything you wrote. English is not my native language and I'm surely missing some DB concepts you have. I will see what is possible in the future but I can't modify the tables design right now. I understand null represent the absence of value but I'm still surprised you wrote that
NOT ... IS DISTINCT FROM
is the equivalent of=
because I don't get the same result when using one or the other.Jérémie– Jérémie2022年04月09日 09:15:23 +00:00Commented Apr 9, 2022 at 9:15
explain analyze
on your queryrow_number() OVER () AS id
is wrong. You must include anorder by
within theover
clause, else there is no guarantee that the same row will have the same ID in subsequent calls (like between fetching the shape and the attributes)