0

I have two DataFrames with Lat, Long columns and other additional columns. For example,

 import pandas as pd
 import geopandas as gpd
 
 df1 = pd.DataFrame({
 'id': [0, 1, 2],
 'dt': [01-01-2022, 02-01-2022, 03-01-2022],
 'Lat': [33.155480, 33.155480, 33.155480],
 'Long': [-96.731630, -96.731630, -96.731630]
 })
 
 
 df2 = pd.DataFrame({
 'val': ['a', 'b', 'c'],
 'dt': [01-01-2022, 02-01-2022, 03-01-2022],
 'Lat': [33.155480, 33.155480, 33.155480],
 'Long': [-96.731630, -96.731630, -96.731630]
 })

I'd like to do a spatial join not just on lat, long but also on date column. Expected output:

id dt lat long val
0 01-01-2022 33.155480 -96.731630 a
1 02-01-2022 33.155480 -96.731630 b
2 03-01-2022 33.155480 -96.731630 c
Rohit Gupta
3332 gold badges4 silver badges10 bronze badges
asked Feb 17, 2023 at 21:54
1
  • This post does not meet our quality standards. you need to add "Any background research you've tried but wasn't enough to solve your problem". check this guideline. No reason that spatial join sjoin does not work. Commented Feb 17, 2023 at 22:09

1 Answer 1

2

You can spatial join, then select rows where dates match:

import pandas as pd
import geopandas as gpd
 
df1 = pd.DataFrame({'id': [0, 1, 2], 'dt': ["01-01-2022", "02-01-2022", "03-01-2022"], 'Lat': [33.155480, 33.155480, 33.155480], 'Long': [-96.731630, -96.731630, -96.731630]})
df1["dt"] = pd.to_datetime(df1["dt"]) #String to datetime
df1 = gpd.GeoDataFrame(data=df1, geometry=gpd.points_from_xy(x=df1["Long"], y=df1["Lat"]), crs="epsg:4326") #Create a geodataframe
# id dt Lat Long geometry
# 0 0 2022年01月01日 33.15548 -96.73163 POINT (-96.73163 33.15548)
# 1 1 2022年02月01日 33.15548 -96.73163 POINT (-96.73163 33.15548)
# 2 2 2022年03月01日 33.15548 -96.73163 POINT (-96.73163 33.15548)
df2 = pd.DataFrame({'val': ['a', 'b', 'c'], 'dt': ["01-01-2022", "02-01-2022", "03-01-2022"], 'Lat': [33.155480, 33.155480, 33.155480], 'Long': [-96.731630, -96.731630, -96.731630]})
df2["dt"] = pd.to_datetime(df2["dt"])
df2 = gpd.GeoDataFrame(data=df2, geometry=gpd.points_from_xy(x=df2["Long"], y=df2["Lat"]), crs="epsg:4326")
# val dt Lat Long geometry
# 0 a 2022年01月01日 33.15548 -96.73163 POINT (-96.73163 33.15548)
# 1 b 2022年02月01日 33.15548 -96.73163 POINT (-96.73163 33.15548)
# 2 c 2022年03月01日 33.15548 -96.73163 POINT (-96.73163 33.15548)
df3 = gpd.sjoin(df1, df2) #Spatial join
df3 = df3.loc[df3["dt_left"]==df3["dt_right"]] #Select the rows with matching dates
#id dt_left Lat_left Long_left geometry index_right val dt_right Lat_right Long_right
#0 2022年01月01日 00:00:00 33.15548 -96.73163 POINT (-96.73163 33.15548) 0 a 2022年01月01日 00:00:00 33.15548 -96.73163
#1 2022年02月01日 00:00:00 33.15548 -96.73163 POINT (-96.73163 33.15548) 1 b 2022年02月01日 00:00:00 33.15548 -96.73163
#2 2022年03月01日 00:00:00 33.15548 -96.73163 POINT (-96.73163 33.15548) 2 c 2022年03月01日 00:00:00 33.15548 -96.73163
answered Feb 25, 2023 at 18:02

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.