Converting Pandas DataFrame to GeoDataFrame

Question 1

This seems like a simple enough question, but I can't figure out how to convert a Pandas DataFrame to a GeoDataFrame for a spatial join?

Here is an example of what my data looks like using df.head():

 Date/Time Lat Lon ID
0 4/1/2014 0:11:00 40.7690 -73.9549 140
1 4/1/2014 0:17:00 40.7267 -74.0345 NaN

In fact, this DataFrame was created from a CSV so if it's easier to read the CSV directly as a GeoDataFrame that's fine too.

Question 2

use GeoPandas

Question 3

Convert the DataFrame's content (e.g. Lat and Lon columns) into appropriate Shapely geometries first and then use them together with the original DataFrame to create a GeoDataFrame.

from geopandas import GeoDataFrame
from shapely.geometry import Point
geometry = [Point(xy) for xy in zip(df.Lon, df.Lat)]
df = df.drop(['Lon', 'Lat'], axis=1)
gdf = GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)

Result:

 Date/Time ID geometry
0 4/1/2014 0:11:00 140 POINT (-73.95489999999999 40.769)
1 4/1/2014 0:17:00 NaN POINT (-74.03449999999999 40.7267)

Since the geometries often come in the WKT format, I thought I'd include an example for that case as well:

import geopandas as gpd
import shapely.wkt
geometry = df['wktcolumn'].map(shapely.wkt.loads)
df = df.drop('wktcolumn', axis=1)
gdf = gpd.GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)

Question 4

Thanks again! That's much simpler and runs very fast - much better than iterating through every row of the df at my n=500,000 :)

Question 5

Gosh, thanks! I check this answer like every 2 days :)

Question 6

you'd think this would be the first entry in the documentation!

Question 7

+1 for the shapely.wkt. It took me a while to figure this out!

Question 8

In order to avoid deleting lat/lon columns from the pandas df (in case you need to use it later), I would instead recommend dropping lat/lon in the creation of gdf like so gdf = GeoDataFrame(df.drop(['Lon', 'Lat'], axis=1), crs=crs, geometry=geometry)

Question 9

Update 2019-12: The official documentation does it succinctly using geopandas.points_from_xy like so:

gdf = geopandas.GeoDataFrame(
 df,
 geometry=geopandas.points_from_xy(x=df.Longitude, y=df.Latitude)
)

You can also set a crs or z (e.g. elevation) value if you want.

Old Method: Using shapely

One-liners! Plus some performance pointers for big-data people.

Given a pandas.DataFrame that has x Longitude and y Latitude like so:

df.head()
x y
0 229.617902 -73.133816
1 229.611157 -73.141299
2 229.609825 -73.142795
3 229.607159 -73.145782
4 229.605825 -73.147274

Let's convert the pandas.DataFrame into a geopandas.GeoDataFrame as follows:

Library imports and shapely speedups:

import geopandas as gpd
import shapely
shapely.speedups.enable() # enabled by default from version 1.6.0

Code + benchmark times on a test dataset I have lying around:

#Martin's original version:
#%timeit 1.87 s ± 7.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
 crs={'init': 'epsg:4326'},
 geometry=[shapely.geometry.Point(xy) for xy in zip(df.x, df.y)])
#Pandas apply method
#%timeit 8.59 s ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
 crs={'init': 'epsg:4326'},
 geometry=df.apply(lambda row: shapely.geometry.Point((row.x, row.y)), axis=1))

Using pandas.apply is surprisingly slower, but may be a better fit for some other workflows (e.g. on bigger datasets using dask library):

Credits to:

Making shapefile from Pandas dataframe? (for the pandas apply method)
Speed up row-wise point in polygon with Geopandas (for the speedup hint)

Some Work-In-Progress references (as of 2017) for handling big dask datasets:

Question 10

Thanks for the comparison, indeed the zip version is way faster

Question 11

Here's a function taken from the internals of geopandas and slightly modified to handle a dataframe with a geometry/polygon column already in wkt format.

from geopandas import GeoDataFrame
import shapely
def df_to_geodf(df, geom_col="geom", crs=None, wkt=True):
 """
 Transforms a pandas DataFrame into a GeoDataFrame.
 The column 'geom_col' must be a geometry column in WKB representation.
 To be used to convert df based on pd.read_sql to gdf.
 Parameters
 ----------
 df : DataFrame
 pandas DataFrame with geometry column in WKB representation.
 geom_col : string, default 'geom'
 column name to convert to shapely geometries
 crs : pyproj.CRS, optional
 CRS to use for the returned GeoDataFrame. The value can be anything accepted
 by :meth:`pyproj.CRS.from_user_input() <pyproj.crs.CRS.from_user_input>`,
 such as an authority string (eg "EPSG:4326") or a WKT string.
 If not set, tries to determine CRS from the SRID associated with the
 first geometry in the database, and assigns that to all geometries.
 Returns
 -------
 GeoDataFrame
 """
 if geom_col not in df:
 raise ValueError("Query missing geometry column '{}'".format(geom_col))
 geoms = df[geom_col].dropna()
 if not geoms.empty:
 if wkt == True:
 load_geom = shapely.wkt.loads
 else:
 load_geom_bytes = shapely.wkb.loads
 """Load from Python 3 binary."""
 def load_geom_buffer(x):
 """Load from Python 2 binary."""
 return shapely.wkb.loads(str(x))
 def load_geom_text(x):
 """Load from binary encoded as text."""
 return shapely.wkb.loads(str(x), hex=True)
 if isinstance(geoms.iat[0], bytes):
 load_geom = load_geom_bytes
 else:
 load_geom = load_geom_text
 df[geom_col] = geoms = geoms.apply(load_geom)
 if crs is None:
 srid = shapely.geos.lgeos.GEOSGetSRID(geoms.iat[0]._geom)
 # if no defined SRID in geodatabase, returns SRID of 0
 if srid != 0:
 crs = "epsg:{}".format(srid)
 return GeoDataFrame(df, crs=crs, geometry=geom_col)

Martin Valgur Martin Valgur 2,1381 gold badge17 silver badges19 bronze badges · Accepted Answer · 2015-12-16 21:39:35Z

Convert the DataFrame's content (e.g. Lat and Lon columns) into appropriate Shapely geometries first and then use them together with the original DataFrame to create a GeoDataFrame.

from geopandas import GeoDataFrame
from shapely.geometry import Point
geometry = [Point(xy) for xy in zip(df.Lon, df.Lat)]
df = df.drop(['Lon', 'Lat'], axis=1)
gdf = GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)

Result:

 Date/Time ID geometry
0 4/1/2014 0:11:00 140 POINT (-73.95489999999999 40.769)
1 4/1/2014 0:17:00 NaN POINT (-74.03449999999999 40.7267)

Since the geometries often come in the WKT format, I thought I'd include an example for that case as well:

import geopandas as gpd
import shapely.wkt
geometry = df['wktcolumn'].map(shapely.wkt.loads)
df = df.drop('wktcolumn', axis=1)
gdf = gpd.GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)

Thanks again! That's much simpler and runs very fast - much better than iterating through every row of the df at my n=500,000 :)
you'd think this would be the first entry in the documentation!
+1 for the shapely.wkt. It took me a while to figure this out!
In order to avoid deleting lat/lon columns from the pandas df (in case you need to use it later), I would instead recommend dropping lat/lon in the creation of gdf like so gdf = GeoDataFrame(df.drop(['Lon', 'Lat'], axis=1), crs=crs, geometry=geometry)

Stack Exchange Network

Converting Pandas DataFrame to GeoDataFrame

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Converting Pandas DataFrame to GeoDataFrame

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions