I have a dataframe with a column composed by date object and a column composed by time object.
I have to merge the two columns.
Personally, I think it is so ugly the following solution.
Why I have to cast to str?
I crafted my solution based on this answer
#importing all the necessary libraries
import pandas as pd
import datetime
#I have only to create a Minimal Reproducible Example
time1 = datetime.time(3,45,12)
time2 = datetime.time(3,49,12)
date1 = datetime.datetime(2020, 5, 17)
date2 = datetime.datetime(2021, 5, 17)
date_dict= {"time1":[time1,time2],"date1":[date1,date2]}
df=pd.DataFrame(date_dict)
df["TimeMerge"] = pd.to_datetime(df.date1.astype(str)+' '+df.time1.astype(str))
1 Answer 1
We can let pandas handle this for us and use DataFrame.apply and datetime.datetime.combine like this:
df["TimeMerge"] = df.apply(lambda row: datetime.datetime.combine(row.date1, row.time1), axis=1)
Although the following approach is more explicit and might therefore be more readable if you're not familiar with DataFrame.apply, I would strongly recommend the first approach.
You could also manually map datetime.datetime.combine over a zip object of date1 and time1:
def combine_date_time(d_t: tuple) -> datetime.datetime:
return datetime.datetime.combine(*d_t)
df["TimeMerge"] = pd.Series(map(combine_date_time, zip(df.date1, df.time1)))
You can also inline it as an anonymous lambda function:
df["TimeMerge"] = pd.Series(map(lambda d_t: datetime.datetime.combine(*d_t), zip(df.date1, df.time1)))
This is handy for simple operations, but I would advise against this one-liner in this case.
By the way, the answer your were looking for can also be found under the question you linked.
-
1\$\begingroup\$ "can and should" seems too conclusive though.
apply(axis=1)is usually the slowest option. in this case, it's 2x slower thanpd.to_datetime()at 200 rows, 3x slower at 2K rows, 3.5x slower at 2M rows. \$\endgroup\$tdy– tdy2021年05月12日 03:28:49 +00:00Commented May 12, 2021 at 3:28 -
\$\begingroup\$ Thank you for the input, I was expecting
df.applyto be faster than that. I reduced it to "can". \$\endgroup\$riskypenguin– riskypenguin2021年05月12日 10:21:20 +00:00Commented May 12, 2021 at 10:21 -
\$\begingroup\$ Thank you @riskypenguin I missed the specific answer :) \$\endgroup\$Andrea Ciufo– Andrea Ciufo2021年05月18日 06:40:12 +00:00Commented May 18, 2021 at 6:40
+fordateandtime, but your currentstr/to_datetime()code is the fastest way to do it (even if it looks uglier) \$\endgroup\$pd.read_csv(..., parse_dates=[['date1','time1']])would probably be the "prettiest" and fastest option \$\endgroup\$