I have an sjoin function from geopandas that is behaving erratically: it works on some version of the "points" geodataframe but not others.
merged=sjoin(points,polygons, how='left',op='within')
The error I get is always:
rtree.core.RTreeError: Coordinates must be in the form (minx, miny, maxx, maxy) or (x, y) for 2D indexes
The "polygons" geodataframe never changes. The size of the "points" geodataframe depends on how much data I want to include (in a parameter). Generally the join fails when I include more data (e.g. 100,000 rows), and succeeds on smaller datasets (e.g. 2,000 rows). I assume this is because some rows contain invalid data. However on visual inspection I cannot find anything wrong with any row.
Is there a way to quickly find out which rows are blocking the join, or to automatically ignore them?
I can't easily share the full code and data.
-
Use the classic version without GeoPandas (More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc), compared with the GeoPandas version (gis.stackexchange.com/a/165413/2581))gene– gene2016年02月11日 16:41:29 +00:00Commented Feb 11, 2016 at 16:41
-
Thanks, but do you have any idea why this wouldn't work, what this error means or how to find problematic data? I quite like geopandas so I don't want to discard its future use for a problem I don't even understand.Alexis Eggermont– Alexis Eggermont2016年02月12日 01:19:23 +00:00Commented Feb 12, 2016 at 1:19
-
1Try to find the problematic data with the solution without Pandasgene– gene2016年02月14日 10:04:13 +00:00Commented Feb 14, 2016 at 10:04
1 Answer 1
There are various reasons why this error can occur, here are the ones I have experienced and the solutions:
- Your input data sets do not have clean sequential indices (i.e. there are gaps in the sequence due to prior exclusion of rows).
I'm not sure exactly why this causes the error but it can be resolved by calling
pd.reset_index(drop=True)
on both input GeoDataFrames before applying sjoin.
- There are invalid geometry objects in your polygons data frame.
If your polygons were drawn by hand (i.e. manually on a GIS) they may have overlaps or self-intersections that don't translate well further in the process. Or your polygons could be empty which can happen in PostGIS with complex function sequences.
The solution is to ensure that all your polygons are of the correct type and are valid. In PostGIS you can use the functions ST_IsValid
and ST_IsEmpty
to check for this and remove or amend any problems. You should also check that you have Polygons or MultiPolygons not GeometryCollections.
Explore related questions
See similar questions with these tags.