80

This seems like a simple enough question, but I can't figure out how to convert a Pandas DataFrame to a GeoDataFrame for a spatial join?

Here is an example of what my data looks like using df.head():

 Date/Time Lat Lon ID
0 4/1/2014 0:11:00 40.7690 -73.9549 140
1 4/1/2014 0:17:00 40.7267 -74.0345 NaN

In fact, this DataFrame was created from a CSV so if it's easier to read the CSV directly as a GeoDataFrame that's fine too.

Taras
35.8k5 gold badges77 silver badges151 bronze badges
asked Dec 16, 2015 at 21:14
1
  • 2
    use GeoPandas Commented Dec 16, 2015 at 21:17

3 Answers 3

141

Convert the DataFrame's content (e.g. Lat and Lon columns) into appropriate Shapely geometries first and then use them together with the original DataFrame to create a GeoDataFrame.

from geopandas import GeoDataFrame
from shapely.geometry import Point
geometry = [Point(xy) for xy in zip(df.Lon, df.Lat)]
df = df.drop(['Lon', 'Lat'], axis=1)
gdf = GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)

Result:

 Date/Time ID geometry
0 4/1/2014 0:11:00 140 POINT (-73.95489999999999 40.769)
1 4/1/2014 0:17:00 NaN POINT (-74.03449999999999 40.7267)

Since the geometries often come in the WKT format, I thought I'd include an example for that case as well:

import geopandas as gpd
import shapely.wkt
geometry = df['wktcolumn'].map(shapely.wkt.loads)
df = df.drop('wktcolumn', axis=1)
gdf = gpd.GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)
answered Dec 16, 2015 at 21:39
5
  • 1
    Thanks again! That's much simpler and runs very fast - much better than iterating through every row of the df at my n=500,000 :) Commented Dec 16, 2015 at 22:42
  • 7
    Gosh, thanks! I check this answer like every 2 days :) Commented Dec 21, 2016 at 16:25
  • 1
    you'd think this would be the first entry in the documentation! Commented May 14, 2017 at 16:53
  • +1 for the shapely.wkt. It took me a while to figure this out! Commented Dec 12, 2017 at 15:14
  • 1
    In order to avoid deleting lat/lon columns from the pandas df (in case you need to use it later), I would instead recommend dropping lat/lon in the creation of gdf like so gdf = GeoDataFrame(df.drop(['Lon', 'Lat'], axis=1), crs=crs, geometry=geometry) Commented May 27, 2020 at 19:43
58

Update 2019-12: The official documentation does it succinctly using geopandas.points_from_xy like so:

gdf = geopandas.GeoDataFrame(
 df,
 geometry=geopandas.points_from_xy(x=df.Longitude, y=df.Latitude)
)

You can also set a crs or z (e.g. elevation) value if you want.


Old Method: Using shapely

One-liners! Plus some performance pointers for big-data people.

Given a pandas.DataFrame that has x Longitude and y Latitude like so:

df.head()
x y
0 229.617902 -73.133816
1 229.611157 -73.141299
2 229.609825 -73.142795
3 229.607159 -73.145782
4 229.605825 -73.147274

Let's convert the pandas.DataFrame into a geopandas.GeoDataFrame as follows:

Library imports and shapely speedups:

import geopandas as gpd
import shapely
shapely.speedups.enable() # enabled by default from version 1.6.0

Code + benchmark times on a test dataset I have lying around:

#Martin's original version:
#%timeit 1.87 s ± 7.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
 crs={'init': 'epsg:4326'},
 geometry=[shapely.geometry.Point(xy) for xy in zip(df.x, df.y)])
#Pandas apply method
#%timeit 8.59 s ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
 crs={'init': 'epsg:4326'},
 geometry=df.apply(lambda row: shapely.geometry.Point((row.x, row.y)), axis=1))

Using pandas.apply is surprisingly slower, but may be a better fit for some other workflows (e.g. on bigger datasets using dask library):

Credits to:

Some Work-In-Progress references (as of 2017) for handling big dask datasets:

answered Oct 13, 2017 at 2:29
1
  • Thanks for the comparison, indeed the zip version is way faster Commented Mar 27, 2019 at 10:58
0

Here's a function taken from the internals of geopandas and slightly modified to handle a dataframe with a geometry/polygon column already in wkt format.

from geopandas import GeoDataFrame
import shapely
def df_to_geodf(df, geom_col="geom", crs=None, wkt=True):
 """
 Transforms a pandas DataFrame into a GeoDataFrame.
 The column 'geom_col' must be a geometry column in WKB representation.
 To be used to convert df based on pd.read_sql to gdf.
 Parameters
 ----------
 df : DataFrame
 pandas DataFrame with geometry column in WKB representation.
 geom_col : string, default 'geom'
 column name to convert to shapely geometries
 crs : pyproj.CRS, optional
 CRS to use for the returned GeoDataFrame. The value can be anything accepted
 by :meth:`pyproj.CRS.from_user_input() <pyproj.crs.CRS.from_user_input>`,
 such as an authority string (eg "EPSG:4326") or a WKT string.
 If not set, tries to determine CRS from the SRID associated with the
 first geometry in the database, and assigns that to all geometries.
 Returns
 -------
 GeoDataFrame
 """
 if geom_col not in df:
 raise ValueError("Query missing geometry column '{}'".format(geom_col))
 geoms = df[geom_col].dropna()
 if not geoms.empty:
 if wkt == True:
 load_geom = shapely.wkt.loads
 else:
 load_geom_bytes = shapely.wkb.loads
 """Load from Python 3 binary."""
 def load_geom_buffer(x):
 """Load from Python 2 binary."""
 return shapely.wkb.loads(str(x))
 def load_geom_text(x):
 """Load from binary encoded as text."""
 return shapely.wkb.loads(str(x), hex=True)
 if isinstance(geoms.iat[0], bytes):
 load_geom = load_geom_bytes
 else:
 load_geom = load_geom_text
 df[geom_col] = geoms = geoms.apply(load_geom)
 if crs is None:
 srid = shapely.geos.lgeos.GEOSGetSRID(geoms.iat[0]._geom)
 # if no defined SRID in geodatabase, returns SRID of 0
 if srid != 0:
 crs = "epsg:{}".format(srid)
 return GeoDataFrame(df, crs=crs, geometry=geom_col)
answered Feb 26, 2022 at 17:20

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.