3

I have a set of polygons that are stored flat in a table (these are OSM admin boundaries extracted through Osmium and osm2pgsql). Each polygon has one record for it. However, these polygons form hierarchies.

How do I identify these hierarchies and store only one polygon (the smallest) with the attributes for its parents using PostGIS?

This is how the data looks like:

enter image description here

Currently, the data is stored as:

A
AA
AB
AC
AB1
AB2
AC1
AC2
AC11
AC12
AC21
AC22

I want to go from that to:

AA, A, geom of AA
AB1, AB, A, geom of AB1
AB2, AB, A, geom of AB2
AC11, AC1, AC, A, geom of AC11
AC12, AC1, AC, A, geom of AC12
AC21, AC2, AC, A, geom of AC21
AC22, AC2, AC, A, geom of AC21

In a real world example,

AC11 could be Boston, AC1 Suffolk county, AC Massachusetts and A United States.

The only relationship these polygons share is a spatial one. The solution needs to be able to be applied at scale. For example, all of the administrative boundaries of the US.

asked Nov 26, 2019 at 22:18
1
  • 2
    are you able to upload a small sample set that maybe represents your explanation above that we can work with. I think there is an admin_level column that could be leveraged in this instance, but some representative data would help. Commented Nov 27, 2019 at 0:44

3 Answers 3

3

If the set of polygons forms a true hierarchy, then the following approach can be used:

  1. Compute the containment hierarchy for the polygons, via the spatial relationship of a polygon interior point being contained in a polygon of larger and minimum area.
  2. Use a recursive CTE to compute the "paths" to the leaf polygons in the hierarchy
  3. The leaf geometries can be included by joining back to the original table

The following SQL provides some mock data and the queries to compute the leaf paths:

WITH RECURSIVE data(id, geom) AS (VALUES
('AC11', 'POLYGON ((100 200, 150 200, 150 150, 100 150, 100 200))'),
('AC12', 'POLYGON ((200 200, 200 150, 150 150, 150 200, 200 200))'),
('AC21', 'POLYGON ((200 100, 150 100, 150 150, 200 150, 200 100))'),
('AC22', 'POLYGON ((100 100, 100 150, 150 150, 150 100, 100 100))'),
('AC1', 'POLYGON ((200 200, 200 150, 100 150, 100 200, 200 200))'),
('AC2', 'POLYGON ((200 100, 100 100, 100 150, 200 150, 200 100))'),
('AC', 'POLYGON ((100 200, 200 200, 200 100, 100 100, 100 200))'),
('AB1', 'POLYGON ((100 300, 150 300, 150 200, 100 200, 100 300))'),
('AB2', 'POLYGON ((200 300, 200 200, 150 200, 150 300, 200 300))'),
('AB', 'POLYGON ((100 300, 200 300, 200 200, 100 200, 100 300))'),
('AA', 'POLYGON ((0 300, 100 300, 100 100, 0 100, 0 300))'),
('A', 'POLYGON ((200 100, 0 100, 0 300, 200 300, 200 100))')
),
-- compute all containment links
contains AS ( SELECT p.id idpar, c.id idch, ST_Area(p.geom) par_area
 FROM data p 
 JOIN data c ON ST_Contains(p.geom, ST_PointOnSurface(c.geom))
 WHERE ST_Area(p.geom) > ST_Area(c.geom)
),
-- extract direct containment links, by choosing parent with min area
pcrel AS ( SELECT DISTINCT ON (idch) idpar, idch 
 FROM contains ORDER BY idch, par_area ASC
),
-- compute paths as strings
pcpath(id, path) AS (
 SELECT 'A' AS id, 'A' AS path
 UNION ALL
 SELECT idch AS id, path || ',' || idch
 FROM pcpath JOIN pcrel ON pcpath.id = pcrel.idpar
)
SELECT * FROM pcpath;

Notes

  • This does not depend on the names of the polygons
  • By using an interior point to determine inclusion this is fairly tolerant of polygons which aren't perfectly matched to their parent
  • There should be a spatial index on the geometry in the data table
  • For a large dataset it might be good to precompute the areas.
  • Window functions could be used instead of DISTINCT ON to be more standard
  • It is probably more performant to save the pcrel records in a table and create an index on idpar
  • It might be possible to include the child polygon geometry in the queries to avoid having to join back to the original table
  • The paths could be computed as arrays to make them more structured
answered Jan 13, 2020 at 20:14
2
  • Thanks, your answer sounds promising. I'll try it and post my findings. Commented Jan 14, 2020 at 21:41
  • Thx, fixed the SQL problem. Commented Jan 17, 2020 at 17:44
1

You could do it by parsing the codes like so. You'll need to include the appropriate geom field - I don't know what your table structure looks like:

select area_code, 
case when area_code similar to '[A-Z][A-Z][0-9]' 
 then null 
 when area_code similar to '[A-Z][A-Z][0-9]+' 
 then substring(area_code,1,3) 
 else null end as subsubparent, 
case when area_code similar to '[A-Z][A-Z][0-9]+' 
 then substring(area_code,1,2) 
 else null end as subparent,
substring(area_code,1,1) as parent;

example:

select 'AC2' as area, 
case when 'AC2' similar to '[A-Z][A-Z][0-9]' then null 
when 'AC2' similar to '[A-Z][A-Z][0-9]+' then substring((select 'AC2'),1,3) else null end as subsubparent, 
case when 'AC2' similar to '[A-Z][A-Z][0-9]+' then substring((select 'AC2'),1,2) else null end as subparent,
substring((select 'AC2'),1,1) as parent;

output:
enter image description here

example 2:

select 'AC11' as area, 
case when 'AC11' similar to '[A-Z][A-Z][0-9]' then null 
when 'AC11' similar to '[A-Z][A-Z][0-9]+' then substring((select 'AC11'),1,3) else null end as subsubparent, 
case when 'AC11' similar to '[A-Z][A-Z][0-9]+' then substring((select 'AC11'),1,2) else null end as subparent,
substring((select 'AC11'),1,1) as parent;

output 2:

enter image description here

answered Nov 27, 2019 at 1:20
1
  • Actually, what I am looking for is a spatial intersection. These codes are not standard to derive the parents from them. For example, AC11 could be Boston, AC1 Suffolck county, AC Massachusetts and A United States. The only relationship they share is a spatial one. Commented Nov 27, 2019 at 15:12
1

You can use a recursive query.

For example, if we have this hierarchie:

enter image description here

CREATE TABLE pointlevel AS
(
select 1 as arealevel, 1 as gid, st_buffer(ST_Point(0,0),3) as geom
union
select 1 as arealevel, 2 as gid, st_buffer(ST_Point(5,5),3) as geom
union
select 2 as arealevel, 3 as gid, st_buffer(ST_Point(0,0),2.1) as geom
union
select 3 as arealevel, 4 as gid, st_buffer(ST_Point(0,0),1) as geom
union
select 3 as arealevel, 5 as gid, st_buffer(ST_Point(5,5),1) as geom
union
select 4 as arealevel, 6 as gid, st_buffer(ST_Point(0,0),0.5) as geom
);

With arealevel = 1 (orange circle) being the top hierarchie.

If we want to extract all the information of area containing some area of level 3, we can do:

WITH RECURSIVE hierarchie AS (
 SELECT
 arealevel,
 gid, 
 geom
 FROM
 pointlevel
 WHERE
 arealevel = 3
 UNION
 SELECT
 b.arealevel,
 a.gid,
 b.geom
 FROM
 pointlevel b
 JOIN hierarchie a ON st_contains(b.geom,a.geom)
) 
SELECT * FROM hierarchie
WHERE arealevel < 3
answered Nov 27, 2019 at 18:43

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.