I have a set of polygons that are stored flat in a table (these are OSM admin boundaries extracted through Osmium and osm2pgsql). Each polygon has one record for it. However, these polygons form hierarchies.
How do I identify these hierarchies and store only one polygon (the smallest) with the attributes for its parents using PostGIS?
This is how the data looks like:
Currently, the data is stored as:
A
AA
AB
AC
AB1
AB2
AC1
AC2
AC11
AC12
AC21
AC22
I want to go from that to:
AA, A, geom of AA
AB1, AB, A, geom of AB1
AB2, AB, A, geom of AB2
AC11, AC1, AC, A, geom of AC11
AC12, AC1, AC, A, geom of AC12
AC21, AC2, AC, A, geom of AC21
AC22, AC2, AC, A, geom of AC21
In a real world example,
AC11 could be Boston, AC1 Suffolk county, AC Massachusetts and A United States.
The only relationship these polygons share is a spatial one. The solution needs to be able to be applied at scale. For example, all of the administrative boundaries of the US.
-
2are you able to upload a small sample set that maybe represents your explanation above that we can work with. I think there is an admin_level column that could be leveraged in this instance, but some representative data would help.MickyT– MickyT2019年11月27日 00:44:11 +00:00Commented Nov 27, 2019 at 0:44
3 Answers 3
If the set of polygons forms a true hierarchy, then the following approach can be used:
- Compute the containment hierarchy for the polygons, via the spatial relationship of a polygon interior point being contained in a polygon of larger and minimum area.
- Use a recursive CTE to compute the "paths" to the leaf polygons in the hierarchy
- The leaf geometries can be included by joining back to the original table
The following SQL provides some mock data and the queries to compute the leaf paths:
WITH RECURSIVE data(id, geom) AS (VALUES
('AC11', 'POLYGON ((100 200, 150 200, 150 150, 100 150, 100 200))'),
('AC12', 'POLYGON ((200 200, 200 150, 150 150, 150 200, 200 200))'),
('AC21', 'POLYGON ((200 100, 150 100, 150 150, 200 150, 200 100))'),
('AC22', 'POLYGON ((100 100, 100 150, 150 150, 150 100, 100 100))'),
('AC1', 'POLYGON ((200 200, 200 150, 100 150, 100 200, 200 200))'),
('AC2', 'POLYGON ((200 100, 100 100, 100 150, 200 150, 200 100))'),
('AC', 'POLYGON ((100 200, 200 200, 200 100, 100 100, 100 200))'),
('AB1', 'POLYGON ((100 300, 150 300, 150 200, 100 200, 100 300))'),
('AB2', 'POLYGON ((200 300, 200 200, 150 200, 150 300, 200 300))'),
('AB', 'POLYGON ((100 300, 200 300, 200 200, 100 200, 100 300))'),
('AA', 'POLYGON ((0 300, 100 300, 100 100, 0 100, 0 300))'),
('A', 'POLYGON ((200 100, 0 100, 0 300, 200 300, 200 100))')
),
-- compute all containment links
contains AS ( SELECT p.id idpar, c.id idch, ST_Area(p.geom) par_area
FROM data p
JOIN data c ON ST_Contains(p.geom, ST_PointOnSurface(c.geom))
WHERE ST_Area(p.geom) > ST_Area(c.geom)
),
-- extract direct containment links, by choosing parent with min area
pcrel AS ( SELECT DISTINCT ON (idch) idpar, idch
FROM contains ORDER BY idch, par_area ASC
),
-- compute paths as strings
pcpath(id, path) AS (
SELECT 'A' AS id, 'A' AS path
UNION ALL
SELECT idch AS id, path || ',' || idch
FROM pcpath JOIN pcrel ON pcpath.id = pcrel.idpar
)
SELECT * FROM pcpath;
Notes
- This does not depend on the names of the polygons
- By using an interior point to determine inclusion this is fairly tolerant of polygons which aren't perfectly matched to their parent
- There should be a spatial index on the geometry in the data table
- For a large dataset it might be good to precompute the areas.
- Window functions could be used instead of
DISTINCT ON
to be more standard - It is probably more performant to save the
pcrel
records in a table and create an index onidpar
- It might be possible to include the child polygon geometry in the queries to avoid having to join back to the original table
- The paths could be computed as arrays to make them more structured
You could do it by parsing the codes like so. You'll need to include the appropriate geom field - I don't know what your table structure looks like:
select area_code,
case when area_code similar to '[A-Z][A-Z][0-9]'
then null
when area_code similar to '[A-Z][A-Z][0-9]+'
then substring(area_code,1,3)
else null end as subsubparent,
case when area_code similar to '[A-Z][A-Z][0-9]+'
then substring(area_code,1,2)
else null end as subparent,
substring(area_code,1,1) as parent;
example:
select 'AC2' as area,
case when 'AC2' similar to '[A-Z][A-Z][0-9]' then null
when 'AC2' similar to '[A-Z][A-Z][0-9]+' then substring((select 'AC2'),1,3) else null end as subsubparent,
case when 'AC2' similar to '[A-Z][A-Z][0-9]+' then substring((select 'AC2'),1,2) else null end as subparent,
substring((select 'AC2'),1,1) as parent;
output:
enter image description here
example 2:
select 'AC11' as area,
case when 'AC11' similar to '[A-Z][A-Z][0-9]' then null
when 'AC11' similar to '[A-Z][A-Z][0-9]+' then substring((select 'AC11'),1,3) else null end as subsubparent,
case when 'AC11' similar to '[A-Z][A-Z][0-9]+' then substring((select 'AC11'),1,2) else null end as subparent,
substring((select 'AC11'),1,1) as parent;
output 2:
-
Actually, what I am looking for is a spatial intersection. These codes are not standard to derive the parents from them. For example, AC11 could be Boston, AC1 Suffolck county, AC Massachusetts and A United States. The only relationship they share is a spatial one.c00der– c00der2019年11月27日 15:12:28 +00:00Commented Nov 27, 2019 at 15:12
You can use a recursive query.
For example, if we have this hierarchie:
CREATE TABLE pointlevel AS
(
select 1 as arealevel, 1 as gid, st_buffer(ST_Point(0,0),3) as geom
union
select 1 as arealevel, 2 as gid, st_buffer(ST_Point(5,5),3) as geom
union
select 2 as arealevel, 3 as gid, st_buffer(ST_Point(0,0),2.1) as geom
union
select 3 as arealevel, 4 as gid, st_buffer(ST_Point(0,0),1) as geom
union
select 3 as arealevel, 5 as gid, st_buffer(ST_Point(5,5),1) as geom
union
select 4 as arealevel, 6 as gid, st_buffer(ST_Point(0,0),0.5) as geom
);
With arealevel = 1 (orange circle) being the top hierarchie.
If we want to extract all the information of area containing some area of level 3, we can do:
WITH RECURSIVE hierarchie AS (
SELECT
arealevel,
gid,
geom
FROM
pointlevel
WHERE
arealevel = 3
UNION
SELECT
b.arealevel,
a.gid,
b.geom
FROM
pointlevel b
JOIN hierarchie a ON st_contains(b.geom,a.geom)
)
SELECT * FROM hierarchie
WHERE arealevel < 3