1
\$\begingroup\$

I have a dataframe with a column composed by date object and a column composed by time object. I have to merge the two columns.

Personally, I think it is so ugly the following solution. Why I have to cast to str? I crafted my solution based on this answer

#importing all the necessary libraries 
import pandas as pd 
import datetime
#I have only to create a Minimal Reproducible Example
time1 = datetime.time(3,45,12)
time2 = datetime.time(3,49,12)
date1 = datetime.datetime(2020, 5, 17)
date2 = datetime.datetime(2021, 5, 17)
date_dict= {"time1":[time1,time2],"date1":[date1,date2]}
df=pd.DataFrame(date_dict)
df["TimeMerge"] = pd.to_datetime(df.date1.astype(str)+' '+df.time1.astype(str))
asked May 8, 2021 at 17:19
\$\endgroup\$
2
  • 1
    \$\begingroup\$ i agree it would be nice if they overloaded + for date and time, but your current str/to_datetime() code is the fastest way to do it (even if it looks uglier) \$\endgroup\$ Commented May 12, 2021 at 3:32
  • 1
    \$\begingroup\$ if your real data is coming from a csv, pd.read_csv(..., parse_dates=[['date1','time1']]) would probably be the "prettiest" and fastest option \$\endgroup\$ Commented May 12, 2021 at 3:38

1 Answer 1

2
\$\begingroup\$

We can let pandas handle this for us and use DataFrame.apply and datetime.datetime.combine like this:

df["TimeMerge"] = df.apply(lambda row: datetime.datetime.combine(row.date1, row.time1), axis=1)

Although the following approach is more explicit and might therefore be more readable if you're not familiar with DataFrame.apply, I would strongly recommend the first approach.

You could also manually map datetime.datetime.combine over a zip object of date1 and time1:

def combine_date_time(d_t: tuple) -> datetime.datetime:
 return datetime.datetime.combine(*d_t)
df["TimeMerge"] = pd.Series(map(combine_date_time, zip(df.date1, df.time1)))

You can also inline it as an anonymous lambda function:

df["TimeMerge"] = pd.Series(map(lambda d_t: datetime.datetime.combine(*d_t), zip(df.date1, df.time1)))

This is handy for simple operations, but I would advise against this one-liner in this case.


By the way, the answer your were looking for can also be found under the question you linked.

answered May 8, 2021 at 17:48
\$\endgroup\$
3
  • 1
    \$\begingroup\$ "can and should" seems too conclusive though. apply(axis=1) is usually the slowest option. in this case, it's 2x slower than pd.to_datetime() at 200 rows, 3x slower at 2K rows, 3.5x slower at 2M rows. \$\endgroup\$ Commented May 12, 2021 at 3:28
  • \$\begingroup\$ Thank you for the input, I was expecting df.apply to be faster than that. I reduced it to "can". \$\endgroup\$ Commented May 12, 2021 at 10:21
  • \$\begingroup\$ Thank you @riskypenguin I missed the specific answer :) \$\endgroup\$ Commented May 18, 2021 at 6:40

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.