3

I can't figure out how to convert a pandas DataFrame to a GeoDataFrame.

Here is an example of what my data looks like using df.head():

 crash_date. crash_time latitude longitude location 
0 2019年06月15日T00:00:00.000 14:57 40.8146250 -73.9203600 {'type': 'Point', 'coordinates': [-73.92036, 40.8146250]
1 2019年07月03日T00:00:00.000 0:50 40.8295970 -73.9022450 {'type': 'Point', 'coordinates': [-73.902245, 40.8295970] 
2 2019年06月24日T00:00:00.000 16:45 40.7054600 -73.7949000 {'type': 'Point', 'coordinates': [-73.7949, 40.7054600]
3 2019年06月16日T00:00:00.000 3:25 40.7128030 -73.9541700 {'type': 'Point', 'coordinates': [-73.95417, 40.7128030]

I tried converting it:

geometry = geometry=geopandas.points_from_xy(df.longitude, df.latitude)
df = df.drop(['longitude', 'latitude'], axis=1)
crs = {'init': 'epsg:4326'}
gdf = GeoDataFrame(df, crs=crs, geometry=geometry)

But I'm getting the following error:

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-83-6090c239d222> in <module>
----> 1 geometry = geometry=gpd.points_from_xy(df.longitude, df.latitude)
 2 df = df.drop(['longitude', 'latitude'], axis=1)
 3 crs = {'init': 'epsg:4326'}
 4 gdf = GeoDataFrame(df, crs=crs, geometry=geometry)
 5 
~/opt/anaconda3/lib/python3.7/site-packages/geopandas/array.py in _points_from_xy(x, y, z)
 190 geom = [shapely.geometry.Point(i, j, k) for i, j, k in zip(x, y, z)]
 191 else:
--> 192 geom = [shapely.geometry.Point(i, j) for i, j in zip(x, y)]
 193 return geom
 194 
~/opt/anaconda3/lib/python3.7/site-packages/geopandas/array.py in <listcomp>(.0)
 190 geom = [shapely.geometry.Point(i, j, k) for i, j, k in zip(x, y, z)]
 191 else:
--> 192 geom = [shapely.geometry.Point(i, j) for i, j in zip(x, y)]
 193 return geom
 194 
~/opt/anaconda3/lib/python3.7/site-packages/shapely/geometry/point.py in __init__(self, *args)
 47 BaseGeometry.__init__(self)
 48 if len(args) > 0:
---> 49 self._set_coords(*args)
 50 
 51 # Coordinate getters and setters
~/opt/anaconda3/lib/python3.7/site-packages/shapely/geometry/point.py in _set_coords(self, *args)
 130 self._geom, self._ndim = geos_point_from_py(args[0])
 131 else:
--> 132 self._geom, self._ndim = geos_point_from_py(tuple(args))
 133 
 134 coords = property(BaseGeometry._get_coords, _set_coords)
~/opt/anaconda3/lib/python3.7/site-packages/shapely/geometry/point.py in geos_point_from_py(ob, update_geom, update_ndim)
 207 coords = ob
 208 n = len(coords)
--> 209 dx = c_double(coords[0])
 210 dy = c_double(coords[1])
 211 dz = None
TypeError: must be real number, not str

df.info() shows:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 29 columns):
crash_date 1000 non-null object
crash_time 1000 non-null object
borough 620 non-null object
zip_code 620 non-null object
latitude 900 non-null object
longitude 900 non-null object
location 900 non-null object
on_street_name 782 non-null object
off_street_name 491 non-null object
number_of_persons_injured 1000 non-null object
number_of_persons_killed 1000 non-null object
number_of_pedestrians_injured 1000 non-null object
number_of_pedestrians_killed 1000 non-null object
number_of_cyclist_injured 1000 non-null object
number_of_cyclist_killed 1000 non-null object
number_of_motorist_injured 1000 non-null object
number_of_motorist_killed 1000 non-null object
contributing_factor_vehicle_1 994 non-null object
contributing_factor_vehicle_2 865 non-null object
collision_id 1000 non-null object
vehicle_type_code1 993 non-null object
vehicle_type_code2 812 non-null object
contributing_factor_vehicle_3 61 non-null object
contributing_factor_vehicle_4 23 non-null object
contributing_factor_vehicle_5 7 non-null object
vehicle_type_code_3 59 non-null object
vehicle_type_code_4 23 non-null object
vehicle_type_code_5 7 non-null object
cross_street_name 218 non-null object
dtypes: object(29)
memory usage: 226.7+ KB

Changed first line per recommendation

geometry = gpd.points_from_xy(df.longitude.values.astype('float32'), df.latitude.values.astype('float32'))
df = df.drop(['longitude', 'latitude'], axis=1)
crs = {'init': 'epsg:4326'}
gdf = gpd.GeoDataFrame(df, crs=crs, geometry=geometry)

Error message now shows:

AttributeError: 'DataFrame' object has no attribute 'longitude'
asked Mar 11, 2020 at 20:01
0

1 Answer 1

7

Your lon, lat cols are object types. Cast them to floats:

geometry = geopandas.points_from_xy(df.longitude.astype('float32'), df.latitude.astype('float32'))
#OR
geometry = geopandas.points_from_xy(df['longitude'].astype('float32'), df['latitude'].astype('float32'))
answered Mar 11, 2020 at 20:58
3
  • It's showing a different error now: AttributeError: 'DataFrame' object has no attribute 'longitude' Commented Mar 11, 2020 at 21:26
  • @jon Welcome to GIS SE! We're a little different from other sites; this isn't a discussion forum but a Q&A site. Please check out our short tour to learn about our focussed Q&A format. If it's a different error then that may be best asked as a new question. Commented Mar 11, 2020 at 21:30
  • 1
    Make sure you're running that before you run df = df.drop(['longitude', 'latitude'], axis=1) and try df['longitude'].astype('float32'), df['latitude'].astype('float32') Commented Mar 11, 2020 at 22:34

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.