CSV to Geodataframe : How to have valid geometry objects?

Question 1

I'm writing a script with Geopandas. I try to use a csv of blocks to make a spatial join. So I convert it as a Geodataframe. But when I want to set geometry column it returns me Input geometry column must contain valid geometry objects.

Here is my code to import csv file :

csv_df = pandas.read_csv(csv_file)
csv_gdf = gpd.GeoDataFrame(csv_df)
csv_gdf = csv_gdf.set_geometry('geometry')

Here is csv_gdf.head() before I try to set geometry column :

 id name shortName accountId isMonitored varietalId 
ranchId \
0 14633.0 HC4bas HC4b 346.0 False 4.0 
855.0 
1 14634.0 HC3haut HC3h 346.0 False 4.0 
855.0 
2 14637.0 HC12 HC12 346.0 False 2.0 
855.0 
3 14638.0 HC11haut HC11 346.0 False 72.0 
855.0 
4 14641.0 HC9bas HC9b 346.0 False 4.0 
855.0 
inRowDistance betweenRowDistance \
0 1.2 1.5 
1 1.2 1.5 
2 1.2 1.5 
3 0.9 1.5 
4 1.2 1.5 
 geometry ... \
0 POLYGON ((-0.1642995066034836 44.9397295596186... ... 
1 POLYGON ((-0.1634854066129132 44.9405302549332... ... 
2 POLYGON ((-0.1624824342183362 44.9398350833047... ... 
3 POLYGON ((-0.1592356652491378 44.9399712591478... ... 
4 POLYGON ((-0.1610166332996532 44.9391145465108... ... 
slopeInclination slopeOrientation soilType rowOrientation 
dashboardId \
0 NaN NaN NaN NaN 
NaN 
1 NaN NaN NaN NaN 
NaN 
2 NaN NaN NaN NaN 
NaN 
3 NaN NaN NaN NaN 
NaN 
4 NaN NaN NaN NaN 
NaN 
canopySystemId canopyWidth topWireHeight clusterWireHeight \
0 NaN 0.45 NaN NaN 
1 NaN 0.45 NaN NaN 
2 NaN 0.45 NaN NaN 
3 NaN 0.45 NaN NaN 
4 NaN 0.45 NaN NaN 
pruningSystemId 
0 NaN 
1 NaN 
2 NaN 
3 NaN 
4 NaN 
[5 rows x 27 columns]

Question 2

Whren I try csv_gdf.geom_type it returns None

Question 3

You probably have invalid geometries in your dataset, to find the invalid geometries you can either load your csv to qgis and run Vector -> Geometry Tools -> Check validity

or loop through your dataframe to find the invalid geometries:

for index, row in csv_gdf.iterrows():
 geom = row['geometry']
 if len(geom.coords) <= 2:
 print "This row has an invalid polygon geometry"
 # this is just one example of invalid geometries, there are also overlapping vertices, ...

I would recommend you the first check even if qgis is not tagged in your question

EDIT: generating the geometry as a shapely.geometry object

from shapely.wkt import loads
# either all at once :
csv_gdf['geometry'] = csv_gdf['geometry'].apply(loads))
# or one by one to detect possible geometry errors
for index, row in csv_gdf.iterrows():
 # it will throw an error where the geometry WKT isn't valid
 # csv_gdf.set_value(index, 'geometry', loads(row['geometry'])) --> deprecated
 csv_gdf.loc[index, 'geometry'] = loads(row['geometry'])

Question 4

It returns an error too : if len(geom.coords) <= 2: AttributeError: 'str' object has no attribute 'coords'

Question 5

it means that the geometry column is a string while it should be a shapely.geometry object, I'll edit my answer to generate the geometry as it should be

Question 6

When you use apply and a function without modification of the cell value, you can directly write .apply(loads), there is no need of lambda function.

Question 7

@ImanolUr I don't get it, the cell here is modified since the result goes in the same series as the input

Question 8

Sorry I did not myself clear. What I mean is that if you do e.g `lambda x: function(x + 15)', so you perfom any extra action on x, then you should use lambda. But when you just take the value x, and apply a function to it, you don't need to use lambda.

score 2 · Accepted Answer · 2018-01-12 15:17:01Z

You probably have invalid geometries in your dataset, to find the invalid geometries you can either load your csv to qgis and run Vector -> Geometry Tools -> Check validity

or loop through your dataframe to find the invalid geometries:

for index, row in csv_gdf.iterrows():
 geom = row['geometry']
 if len(geom.coords) <= 2:
 print "This row has an invalid polygon geometry"
 # this is just one example of invalid geometries, there are also overlapping vertices, ...

I would recommend you the first check even if qgis is not tagged in your question

EDIT: generating the geometry as a shapely.geometry object

from shapely.wkt import loads
# either all at once :
csv_gdf['geometry'] = csv_gdf['geometry'].apply(loads))
# or one by one to detect possible geometry errors
for index, row in csv_gdf.iterrows():
 # it will throw an error where the geometry WKT isn't valid
 # csv_gdf.set_value(index, 'geometry', loads(row['geometry'])) --> deprecated
 csv_gdf.loc[index, 'geometry'] = loads(row['geometry'])

It returns an error too : if len(geom.coords) <= 2: AttributeError: 'str' object has no attribute 'coords'
it means that the geometry column is a string while it should be a shapely.geometry object, I'll edit my answer to generate the geometry as it should be
When you use apply and a function without modification of the cell value, you can directly write .apply(loads), there is no need of lambda function.
@ImanolUr I don't get it, the cell here is modified since the result goes in the same series as the input
Sorry I did not myself clear. What I mean is that if you do e.g `lambda x: function(x + 15)', so you perfom any extra action on x, then you should use lambda. But when you just take the value x, and apply a function to it, you don't need to use lambda.

Stack Exchange Network

CSV to Geodataframe : How to have valid geometry objects?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CSV to Geodataframe : How to have valid geometry objects?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions