I'm attempting to do a spatial join much like the example here: Is there a python option to "join attributes by location"?. However, that approach seems really inefficient / slow. Even running this with a modest 250 points takes almost 2 minutes and it fails entirely on shapefiles with> 1,000 points. Is there a better approach? I'd like to do this entirely in Python without using ArcGIS, QGIS, etc.
I'd also be interested to know if it's possible to SUM attributes (i.e. population) of all the points that fall within a polygon and join that quantity to the polygon shapefile.
Here is the code I'm trying to convert. I get an error on line 9:
poly['properties']['score'] += point['properties']['score']
which says:
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'float'.
If I replace the "+=" with "=" it runs fine but that doesn't sum the fields. I've also tried making these as integers but that fails as well.
with fiona.open(poly_shp, 'r') as n:
with fiona.open(point_shp,'r') as s:
outSchema = {'geometry': 'Polygon','properties':{'region':'str','score':'float'}}
with fiona.open (out_shp, 'w', 'ESRI Shapefile', outSchema, crs) as output:
for point in s:
for poly in n:
if shape(point['geometry']).within(shape(poly['geometry'])):
poly['properties']['score']) += point['properties']['score'])
output.write({
'properties':{
'region':poly['properties']['NAME'],
'score':poly['properties']['score']},
'geometry':poly['geometry']})
-
I think you should edit your second question out of here so that this one remains focussed on what I assume is the more important question to you. The other can be researched/asked separately.PolyGeo– PolyGeo ♦2014年06月24日 19:40:00 +00:00Commented Jun 24, 2014 at 19:40
4 Answers 4
Fiona returns Python dictionaries and you can not use poly['properties']['score']) += point['properties']['score'])
with a dictionary.
Example of summing attributes using the references given by Mike T:
enter image description here
# read the shapefiles
import fiona
from shapely.geometry import shape
polygons = [pol for pol in fiona.open('poly.shp')]
points = [pt for pt in fiona.open('point.shp')]
# attributes of the polygons
for poly in polygons:
print poly['properties']
OrderedDict([(u'score', 0)])
OrderedDict([(u'score', 0)])
OrderedDict([(u'score', 0)])
# attributes of the points
for pt in points:
print i['properties']
OrderedDict([(u'score', 1)])
.... # (same for the 8 points)
Now, we can use two methods, with or without a spatial index:
1: without
# iterate through points
for i, pt in enumerate(points):
point = shape(pt['geometry'])
#iterate through polygons
for j, poly in enumerate(polygons):
if point.within(shape(poly['geometry'])):
# sum of attributes values
polygons[j]['properties']['score'] = polygons[j]['properties']['score'] + points[i]['properties']['score']
2: with a R-tree index (you can use pyrtree or rtree)
# Create the R-tree index and store the features in it (bounding box)
from rtree import index
idx = index.Index()
for pos, poly in enumerate(polygons):
idx.insert(pos, shape(poly['geometry']).bounds)
#iterate through points
for i,pt in enumerate(points):
point = shape(pt['geometry'])
# iterate through spatial index
for j in idx.intersection(point.coords[0]):
if point.within(shape(polygons[j]['geometry'])):
polygons[j]['properties']['score'] = polygons[j]['properties']['score'] + points[i]['properties']['score']
Result with the two solutions:
for poly in polygons:
print poly['properties']
OrderedDict([(u'score', 2)]) # 2 points in the polygon
OrderedDict([(u'score', 1)]) # 1 point in the polygon
OrderedDict([(u'score', 1)]) # 1 point in the polygon
What is the difference ?
- Without the index, you must iterate through all the geometries (polygons and points).
- With a bounding spatial index (Spatial Index RTree), you iterate only through the geometries which have a chance to intersect with your current geometry ('filter' which can save a considerable amount of calculations and time...)
- but a Spatial Index is not a magic wand. When a very large part of the dataset has to be retrieved, a Spatial Index cannot give any speed benefit.
After:
schema = fiona.open('poly.shp').schema
with fiona.open ('output.shp', 'w', 'ESRI Shapefile', schema) as output:
for poly in polygons:
output.write(poly)
To go further, look at Using Rtree Spatial Indexing With OGR, Shapely, Fiona
Additionally - geopandas now optionally includes rtree
as a dependency, see the github repo
So, instead of following all the (very nice) code above, you could simply do something like:
import geopandas
from geopandas.tools import sjoin
point = geopandas.GeoDataFrame.from_file('point.shp') # or geojson etc
poly = geopandas.GeoDataFrame.from_file('poly.shp')
pointInPolys = sjoin(point, poly, how='left')
pointSumByPoly = pointInPolys.groupby('PolyGroupByField')['fields', 'in', 'grouped', 'output'].agg(['sum'])
To get this snazzy functionality be sure to install the C-library libspatialindex first
EDIT: corrected package imports
-
I was under the impression
rtree
was optional. Doesn't that mean you need to installrtree
as well as thelibspatialindex
C-library?kuanb– kuanb2017年02月10日 01:54:28 +00:00Commented Feb 10, 2017 at 1:54 -
it's been a while but I think when I did this installing geopandas from github automatically added
rtree
when I had first installedlibspatialindex
... they've done a fairly major release since so I'm sure things have changed a bitclaytonrsh– claytonrsh2017年02月27日 15:16:08 +00:00Commented Feb 27, 2017 at 15:16 -
1I just installed
geopandas
from pypa and needed to installrtree
to getsjoin
to work. See this answer: stackoverflow.com/a/60014884/868724aboutaaron– aboutaaron2020年03月10日 18:01:45 +00:00Commented Mar 10, 2020 at 18:01
Use Rtree as an index to perform the much faster joins, then Shapely to do the spatial predicates to determine if a point is actually within a polygon. If done properly, this can be faster than most other GISes.
The second part of your question concerning 'SUM', use a dict
object to accumulate populations using a polygon id as the key. Although, this type of thing is done much more nicely with PostGIS.
-
Thank you @Mike T ... using the dict object or PostGIS are great suggestions. I'm still a little confused where I can incorporate Rtree in my code, however (included code above).jburrfischer– jburrfischer2014年06月24日 02:45:00 +00:00Commented Jun 24, 2014 at 2:45
This web page shows how to use a Bounding Box point-in-polygon search before the more expensive Within spatial query of Shapely.
http://rexdouglass.com/fast-spatial-joins-in-python-with-a-spatial-index/
-
Thanks @klewis ... any chance you can help with the second part? To sum the point attributes (e.g. population) that fall within the polygons I tried something similar to code below but it threw an error. if shape(school['geometry']).within(shape(neighborhood['geometry'])): neighborhood['properties']['population'] += schools['properties']['population']jburrfischer– jburrfischer2014年06月23日 20:10:04 +00:00Commented Jun 23, 2014 at 20:10
-
If you open neighborhood in 'r' mode, it might be read-only. Do both shapefiles have field population? What line # is throwing the error? Good luck.klewis– klewis2014年06月23日 23:22:45 +00:00Commented Jun 23, 2014 at 23:22
-
Thank you again @klewis ... I've added my code above and explained the error. Also, I've been playing around with rtree and I'm still little confused where I would add this into the code above. Sorry to be a such a bother.jburrfischer– jburrfischer2014年06月24日 02:35:31 +00:00Commented Jun 24, 2014 at 2:35
-
Try this, seems adding None to an int is causing the error. poly_score = poly['properties']['score']) point_score = point['properties']['score']) if point_score: if poly_score poly['properties']['score']) += point_score else: poly['properties']['score']) = point_scoreklewis– klewis2014年06月24日 13:25:22 +00:00Commented Jun 24, 2014 at 13:25