Pandas merge a "%Y%M%D" date column with a "%H:%M:%S" time column

Question 1

I have a dataframe with a column composed by date object and a column composed by time object. I have to merge the two columns.

Personally, I think it is so ugly the following solution. Why I have to cast to str? I crafted my solution based on this answer

#importing all the necessary libraries 
import pandas as pd 
import datetime
#I have only to create a Minimal Reproducible Example
time1 = datetime.time(3,45,12)
time2 = datetime.time(3,49,12)
date1 = datetime.datetime(2020, 5, 17)
date2 = datetime.datetime(2021, 5, 17)
date_dict= {"time1":[time1,time2],"date1":[date1,date2]}
df=pd.DataFrame(date_dict)
df["TimeMerge"] = pd.to_datetime(df.date1.astype(str)+' '+df.time1.astype(str))

Question 2

i agree it would be nice if they overloaded + for date and time, but your current str/to_datetime() code is the fastest way to do it (even if it looks uglier)

Question 3

if your real data is coming from a csv, pd.read_csv(..., parse_dates=[['date1','time1']]) would probably be the "prettiest" and fastest option

Question 4

We can let pandas handle this for us and use DataFrame.apply and datetime.datetime.combine like this:

df["TimeMerge"] = df.apply(lambda row: datetime.datetime.combine(row.date1, row.time1), axis=1)

Although the following approach is more explicit and might therefore be more readable if you're not familiar with DataFrame.apply, I would strongly recommend the first approach.

You could also manually map datetime.datetime.combine over a zip object of date1 and time1:

def combine_date_time(d_t: tuple) -> datetime.datetime:
 return datetime.datetime.combine(*d_t)
df["TimeMerge"] = pd.Series(map(combine_date_time, zip(df.date1, df.time1)))

You can also inline it as an anonymous lambda function:

df["TimeMerge"] = pd.Series(map(lambda d_t: datetime.datetime.combine(*d_t), zip(df.date1, df.time1)))

This is handy for simple operations, but I would advise against this one-liner in this case.

By the way, the answer your were looking for can also be found under the question you linked.

Question 5

"can and should" seems too conclusive though. apply(axis=1) is usually the slowest option. in this case, it's 2x slower than pd.to_datetime() at 200 rows, 3x slower at 2K rows, 3.5x slower at 2M rows.

Question 6

Thank you for the input, I was expecting df.apply to be faster than that. I reduced it to "can".

Question 7

Thank you @riskypenguin I missed the specific answer :)

riskypenguin 3,4931 gold badge10 silver badges28 bronze badges · Accepted Answer · 2021-05-08 17:48:17Z

We can let pandas handle this for us and use DataFrame.apply and datetime.datetime.combine like this:

df["TimeMerge"] = df.apply(lambda row: datetime.datetime.combine(row.date1, row.time1), axis=1)

Although the following approach is more explicit and might therefore be more readable if you're not familiar with DataFrame.apply, I would strongly recommend the first approach.

You could also manually map datetime.datetime.combine over a zip object of date1 and time1:

def combine_date_time(d_t: tuple) -> datetime.datetime:
 return datetime.datetime.combine(*d_t)
df["TimeMerge"] = pd.Series(map(combine_date_time, zip(df.date1, df.time1)))

You can also inline it as an anonymous lambda function:

df["TimeMerge"] = pd.Series(map(lambda d_t: datetime.datetime.combine(*d_t), zip(df.date1, df.time1)))

This is handy for simple operations, but I would advise against this one-liner in this case.

By the way, the answer your were looking for can also be found under the question you linked.

"can and should" seems too conclusive though. apply(axis=1) is usually the slowest option. in this case, it's 2x slower than pd.to_datetime() at 200 rows, 3x slower at 2K rows, 3.5x slower at 2M rows.
Thank you for the input, I was expecting df.apply to be faster than that. I reduced it to "can".

Stack Exchange Network

Pandas merge a "%Y%M%D" date column with a "%H:%M:%S" time column

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Pandas merge a "%Y%M%D" date column with a "%H:%M:%S" time column

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions