I have two DataFrames with Lat
, Long
columns and other additional columns. For example,
import pandas as pd
import geopandas as gpd
df1 = pd.DataFrame({
'id': [0, 1, 2],
'dt': [01-01-2022, 02-01-2022, 03-01-2022],
'Lat': [33.155480, 33.155480, 33.155480],
'Long': [-96.731630, -96.731630, -96.731630]
})
df2 = pd.DataFrame({
'val': ['a', 'b', 'c'],
'dt': [01-01-2022, 02-01-2022, 03-01-2022],
'Lat': [33.155480, 33.155480, 33.155480],
'Long': [-96.731630, -96.731630, -96.731630]
})
I'd like to do a spatial join not just on lat
, long
but also on date column. Expected output:
id | dt | lat | long | val |
---|---|---|---|---|
0 | 01-01-2022 | 33.155480 | -96.731630 | a |
1 | 02-01-2022 | 33.155480 | -96.731630 | b |
2 | 03-01-2022 | 33.155480 | -96.731630 | c |
Rohit Gupta
3332 gold badges4 silver badges10 bronze badges
1 Answer 1
You can spatial join, then select rows where dates match:
import pandas as pd
import geopandas as gpd
df1 = pd.DataFrame({'id': [0, 1, 2], 'dt': ["01-01-2022", "02-01-2022", "03-01-2022"], 'Lat': [33.155480, 33.155480, 33.155480], 'Long': [-96.731630, -96.731630, -96.731630]})
df1["dt"] = pd.to_datetime(df1["dt"]) #String to datetime
df1 = gpd.GeoDataFrame(data=df1, geometry=gpd.points_from_xy(x=df1["Long"], y=df1["Lat"]), crs="epsg:4326") #Create a geodataframe
# id dt Lat Long geometry
# 0 0 2022年01月01日 33.15548 -96.73163 POINT (-96.73163 33.15548)
# 1 1 2022年02月01日 33.15548 -96.73163 POINT (-96.73163 33.15548)
# 2 2 2022年03月01日 33.15548 -96.73163 POINT (-96.73163 33.15548)
df2 = pd.DataFrame({'val': ['a', 'b', 'c'], 'dt': ["01-01-2022", "02-01-2022", "03-01-2022"], 'Lat': [33.155480, 33.155480, 33.155480], 'Long': [-96.731630, -96.731630, -96.731630]})
df2["dt"] = pd.to_datetime(df2["dt"])
df2 = gpd.GeoDataFrame(data=df2, geometry=gpd.points_from_xy(x=df2["Long"], y=df2["Lat"]), crs="epsg:4326")
# val dt Lat Long geometry
# 0 a 2022年01月01日 33.15548 -96.73163 POINT (-96.73163 33.15548)
# 1 b 2022年02月01日 33.15548 -96.73163 POINT (-96.73163 33.15548)
# 2 c 2022年03月01日 33.15548 -96.73163 POINT (-96.73163 33.15548)
df3 = gpd.sjoin(df1, df2) #Spatial join
df3 = df3.loc[df3["dt_left"]==df3["dt_right"]] #Select the rows with matching dates
#id dt_left Lat_left Long_left geometry index_right val dt_right Lat_right Long_right
#0 2022年01月01日 00:00:00 33.15548 -96.73163 POINT (-96.73163 33.15548) 0 a 2022年01月01日 00:00:00 33.15548 -96.73163
#1 2022年02月01日 00:00:00 33.15548 -96.73163 POINT (-96.73163 33.15548) 1 b 2022年02月01日 00:00:00 33.15548 -96.73163
#2 2022年03月01日 00:00:00 33.15548 -96.73163 POINT (-96.73163 33.15548) 2 c 2022年03月01日 00:00:00 33.15548 -96.73163
answered Feb 25, 2023 at 18:02
lang-py
sjoin
does not work.