This seems like a simple enough question, but I can't figure out how to convert a Pandas DataFrame to a GeoDataFrame for a spatial join?
Here is an example of what my data looks like using df.head()
:
Date/Time Lat Lon ID
0 4/1/2014 0:11:00 40.7690 -73.9549 140
1 4/1/2014 0:17:00 40.7267 -74.0345 NaN
In fact, this DataFrame was created from a CSV so if it's easier to read the CSV directly as a GeoDataFrame that's fine too.
-
2use GeoPandasgene– gene2015年12月16日 21:17:10 +00:00Commented Dec 16, 2015 at 21:17
3 Answers 3
Convert the DataFrame's content (e.g. Lat
and Lon
columns) into appropriate Shapely geometries first and then use them together with the original DataFrame to create a GeoDataFrame.
from geopandas import GeoDataFrame
from shapely.geometry import Point
geometry = [Point(xy) for xy in zip(df.Lon, df.Lat)]
df = df.drop(['Lon', 'Lat'], axis=1)
gdf = GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)
Result:
Date/Time ID geometry
0 4/1/2014 0:11:00 140 POINT (-73.95489999999999 40.769)
1 4/1/2014 0:17:00 NaN POINT (-74.03449999999999 40.7267)
Since the geometries often come in the WKT format, I thought I'd include an example for that case as well:
import geopandas as gpd
import shapely.wkt
geometry = df['wktcolumn'].map(shapely.wkt.loads)
df = df.drop('wktcolumn', axis=1)
gdf = gpd.GeoDataFrame(df, crs="EPSG:4326", geometry=geometry)
-
1Thanks again! That's much simpler and runs very fast - much better than iterating through every row of the df at my n=500,000 :)atkat12– atkat122015年12月16日 22:42:26 +00:00Commented Dec 16, 2015 at 22:42
-
7Gosh, thanks! I check this answer like every 2 days :)Owen– Owen2016年12月21日 16:25:27 +00:00Commented Dec 21, 2016 at 16:25
-
1you'd think this would be the first entry in the documentation!Dominik– Dominik2017年05月14日 16:53:11 +00:00Commented May 14, 2017 at 16:53
-
+1 for the shapely.wkt. It took me a while to figure this out!StefanK– StefanK2017年12月12日 15:14:54 +00:00Commented Dec 12, 2017 at 15:14
-
1In order to avoid deleting lat/lon columns from the pandas
df
(in case you need to use it later), I would instead recommend dropping lat/lon in the creation ofgdf
like sogdf = GeoDataFrame(df.drop(['Lon', 'Lat'], axis=1), crs=crs, geometry=geometry)
Gene Burinsky– Gene Burinsky2020年05月27日 19:43:31 +00:00Commented May 27, 2020 at 19:43
Update 2019-12: The official documentation does it succinctly using geopandas.points_from_xy
like so:
gdf = geopandas.GeoDataFrame(
df,
geometry=geopandas.points_from_xy(x=df.Longitude, y=df.Latitude)
)
You can also set a crs
or z
(e.g. elevation) value if you want.
Old Method: Using shapely
One-liners! Plus some performance pointers for big-data people.
Given a pandas.DataFrame
that has x Longitude and y Latitude like so:
df.head()
x y
0 229.617902 -73.133816
1 229.611157 -73.141299
2 229.609825 -73.142795
3 229.607159 -73.145782
4 229.605825 -73.147274
Let's convert the pandas.DataFrame
into a geopandas.GeoDataFrame
as follows:
Library imports and shapely speedups:
import geopandas as gpd
import shapely
shapely.speedups.enable() # enabled by default from version 1.6.0
Code + benchmark times on a test dataset I have lying around:
#Martin's original version:
#%timeit 1.87 s ± 7.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
crs={'init': 'epsg:4326'},
geometry=[shapely.geometry.Point(xy) for xy in zip(df.x, df.y)])
#Pandas apply method
#%timeit 8.59 s ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
gdf = gpd.GeoDataFrame(df.drop(['x', 'y'], axis=1),
crs={'init': 'epsg:4326'},
geometry=df.apply(lambda row: shapely.geometry.Point((row.x, row.y)), axis=1))
Using pandas.apply
is surprisingly slower, but may be a better fit for some other workflows (e.g. on bigger datasets using dask library):
Credits to:
- Making shapefile from Pandas dataframe? (for the pandas apply method)
- Speed up row-wise point in polygon with Geopandas (for the speedup hint)
Some Work-In-Progress references (as of 2017) for handling big dask
datasets:
-
Thanks for the comparison, indeed the zip version is way fasterMCMZL– MCMZL2019年03月27日 10:58:33 +00:00Commented Mar 27, 2019 at 10:58
Here's a function taken from the internals of geopandas and slightly modified to handle a dataframe with a geometry/polygon column already in wkt format.
from geopandas import GeoDataFrame
import shapely
def df_to_geodf(df, geom_col="geom", crs=None, wkt=True):
"""
Transforms a pandas DataFrame into a GeoDataFrame.
The column 'geom_col' must be a geometry column in WKB representation.
To be used to convert df based on pd.read_sql to gdf.
Parameters
----------
df : DataFrame
pandas DataFrame with geometry column in WKB representation.
geom_col : string, default 'geom'
column name to convert to shapely geometries
crs : pyproj.CRS, optional
CRS to use for the returned GeoDataFrame. The value can be anything accepted
by :meth:`pyproj.CRS.from_user_input() <pyproj.crs.CRS.from_user_input>`,
such as an authority string (eg "EPSG:4326") or a WKT string.
If not set, tries to determine CRS from the SRID associated with the
first geometry in the database, and assigns that to all geometries.
Returns
-------
GeoDataFrame
"""
if geom_col not in df:
raise ValueError("Query missing geometry column '{}'".format(geom_col))
geoms = df[geom_col].dropna()
if not geoms.empty:
if wkt == True:
load_geom = shapely.wkt.loads
else:
load_geom_bytes = shapely.wkb.loads
"""Load from Python 3 binary."""
def load_geom_buffer(x):
"""Load from Python 2 binary."""
return shapely.wkb.loads(str(x))
def load_geom_text(x):
"""Load from binary encoded as text."""
return shapely.wkb.loads(str(x), hex=True)
if isinstance(geoms.iat[0], bytes):
load_geom = load_geom_bytes
else:
load_geom = load_geom_text
df[geom_col] = geoms = geoms.apply(load_geom)
if crs is None:
srid = shapely.geos.lgeos.GEOSGetSRID(geoms.iat[0]._geom)
# if no defined SRID in geodatabase, returns SRID of 0
if srid != 0:
crs = "epsg:{}".format(srid)
return GeoDataFrame(df, crs=crs, geometry=geom_col)
Explore related questions
See similar questions with these tags.