I have a series of Easting and Northing coordinates in pandas dataframe along with other columns of information. I know the coordinates fall within and administrative area and want to match the coordinates and rest of columns to that administrative area, I have the shapefile of the administrative boundaries. I'm familiar with R and visually this looks easy, I can plot the coordinate and see it fall into the area. However i have n=8,335 records to match and need to use Python as all other records are loaded and the analysis i have is better suited to Python.
So far i have work flow as;
Convert Easting and Northing to lat and long
Somehow use Shapely to tell me if point is in polygon / I understand that matplotlib.path.Path.contains_points executes faster.
Assign point to polygon and add new column to dataframe with the polgon reference in it.
Use pandas merge to join dataframe to others. Project to map again and check visually.
Does that sound like a sensible approach or is there a more efficient route?
1 Answer 1
If you're working with spatial data and Pandas you should take a look at GeoPandas.
The example below demonstrates how to perform a spatial join in GeoPandas (which uses Shapely). A GeoDataFrame
object is created from a list of cities and their coordinates and is joined to an ESRI Shapefile containing countries.
import pandas
import geopandas
import geopandas.tools
from shapely.geometry import Point
# Create a DataFrame with some cities, including their location
raw_data = [
("London", 51.5, -0.1),
("Paris", 48.9, 2.4),
("San Francisco", 37.8, -122.4),
]
places = pandas.DataFrame(raw_data, columns=["name", "latitude", "longitude"])
# Create the geometry column from the coordinates
# Remember that longitude is east-west (i.e. X) and latitude is north-south (i.e. Y)
places["geometry"] = places.apply(lambda row: Point(row["longitude"], row["latitude"]), axis=1)
del(places["latitude"], places["longitude"])
# Convert to a GeoDataFrame
places = geopandas.GeoDataFrame(places, geometry="geometry")
# Declare the coordinate system for the places GeoDataFrame
# GeoPandas doesn't do any transformations automatically when performing
# the spatial join. The layers are already in the same CRS (WGS84) so no
# transformation is needed.
places.crs = {"init": "epsg:4326"}
# Load the countries polygons
countries = geopandas.GeoDataFrame.from_file("ne_110m_admin_0_countries.shp")
# Drop all columns except the name and polygon geometry
countries = countries[["name", "geometry"]]
# Perform the spatial join
result = geopandas.tools.sjoin(places, countries, how="left")
# Print the results...
print(result.head())
Note that the spatial join feature is still fairly new and hasn't made it into the stable branch yet - you'll need to download and install the development version.
https://github.com/geopandas/geopandas
The result looks like this:
name_left geometry index_right name_right
0 London POINT (-0.1 51.5) 57 United Kingdom
1 Paris POINT (2.4 48.9) 55 France
2 San Francisco POINT (-122.4 37.8) 168 United States
You can also use GeoPandas to plot the data with matplotlib:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1)
countries.plot(ax=ax, color="#cccccc")
places.plot(ax=ax, markersize=5, color="#cc0000")
plt.show()
-
How would i accomplish this with entire states? Meaning creating borders with longitude and latitude. Maybe with a multi-line polygon of some sort.hanleyhansen– hanleyhansen2017年03月02日 19:36:59 +00:00Commented Mar 2, 2017 at 19:36