490

How can I convert a DataFrame column of strings (in dd/mm/yyyy format) to datetime dtype?

cottontail
25.9k26 gold badges187 silver badges178 bronze badges
asked Jun 16, 2013 at 15:14
0

7 Answers 7

741

The easiest way is to use to_datetime:

df['col'] = pd.to_datetime(df['col'])

It also offers a dayfirst argument for European times (but beware this isn't strict).

Here it is in action:

In [11]: pd.to_datetime(pd.Series(['05/23/2005']))
Out[11]:
0 2005年05月23日 00:00:00
dtype: datetime64[ns]

You can pass a specific format:

In [12]: pd.to_datetime(pd.Series(['05/23/2005']), format="%m/%d/%Y")
Out[12]:
0 2005年05月23日
dtype: datetime64[ns]
Trenton McKinney
63.2k41 gold badges170 silver badges214 bronze badges
answered Jun 16, 2013 at 15:18
Sign up to request clarification or add additional context in comments.

Comments

79

If your date column is a string of the format '2017-01-01' you can use pandas astype to convert it to datetime.

df['date'] = df['date'].astype('datetime64[ns]')

or use datetime64[D] if you want Day precision and not nanoseconds

print(type(df['date'].iloc[0])) 

yields

<class 'pandas._libs.tslib.Timestamp'>

the same as when you use pandas.to_datetime

You can try it with other formats then '%Y-%m-%d' but at least this works.

answered Jun 26, 2017 at 14:35

1 Comment

fyi when timezone is specified in the string it ignores it
62

You can use the following if you want to specify tricky formats:

df['date_col'] = pd.to_datetime(df['date_col'], format='%d/%m/%Y')

More details on format here:

campeterson
3,7492 gold badges28 silver badges28 bronze badges
answered May 2, 2018 at 8:14

Comments

26

If you have a mixture of formats in your date, don't forget to set infer_datetime_format=True to make life easier.

df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)

Source: pd.to_datetime

or if you want a customized approach:

def autoconvert_datetime(value):
 formats = ['%m/%d/%Y', '%m-%d-%y'] # formats to try
 result_format = '%d-%m-%Y' # output format
 for dt_format in formats:
 try:
 dt_obj = datetime.strptime(value, dt_format)
 return dt_obj.strftime(result_format)
 except Exception as e: # throws exception when format doesn't match
 pass
 return value # let it be if it doesn't match
df['date'] = df['date'].apply(autoconvert_datetime)
wjandrea
34.1k10 gold badges69 silver badges107 bronze badges
answered Jul 28, 2019 at 1:04

2 Comments

A customized approach can be used without resorting to .apply which has no fast cache, and will struggle when converting a billion values. An alternative, but not a great one, is col = pd.concat([pd.to_datetime(col, errors='coerce', format=f) for f in formats], axis='columns').bfill(axis='columns').iloc[:, 0]
If you have a mixture of formats, you should not use infer_datetime_format=True as this assumes a single format. Just skip this argument. To understand why, try pd.to_datetime(pd.Series(['1/5/2015 8:08:00 AM', '1/4/2015 11:24:00 PM']), infer_datetime_format=True) with and without errors='coerce'. See this issue.
10
Multiple datetime columns

If you want to convert multiple string columns to datetime, then using apply() would be useful.

df[['date1', 'date2']] = df[['date1', 'date2']].apply(pd.to_datetime)

You can pass parameters to to_datetime as kwargs.

df[['start_date', 'end_date']] = df[['start_date', 'end_date']].apply(pd.to_datetime, format="%m/%d/%Y")

Passing to apply, without specifying axis, still converts values vectorially for each column. apply is needed here because pd.to_datetime can only be called on a single column. If it has to be called on multiple columns, the options are either use an explicit for-loop, or pass it to apply. On the other hand, if you call pd.to_datetime using apply on a column (e.g. df['date'].apply(pd.to_datetime)), that would not be vectorized, and should be avoided.


Use format= to speed up

If the column contains a time component and you know the format of the datetime/time, then passing the format explicitly would significantly speed up the conversion. There's barely any difference if the column is only date, though. In my project, for a column with 5 millions rows, the difference was huge: ~2.5 min vs 6s.

It turns out explicitly specifying the format is about 25x faster. The following runtime plot shows that there's a huge gap in performance depending on whether you passed format or not.

timings


The code used to produce the plot:

import perfplot
import random
mdYHM = range(1, 13), range(1, 29), range(2000, 2024), range(24), range(60)
perfplot.show(
 kernels=[lambda x: pd.to_datetime(x), lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M')],
 labels=['pd.to_datetime(x)', "pd.to_datetime(x, format='%m/%d/%Y %H:%M')"],
 n_range=[2**k for k in range(19)],
 setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}" 
 for m,d,Y,H,M in zip(*[random.choices(e, k=n) for e in mdYHM])]),
 equality_check=pd.Series.equals,
 xlabel='len(df)'
)
Trenton McKinney
63.2k41 gold badges170 silver badges214 bronze badges
answered Jan 27, 2023 at 2:05

Comments

2

Try this solution:

  • Change '2022–12–31 00:00:00' to '2022–12–31 00:00:01'
  • Then run this code: pandas.to_datetime(pandas.Series(['2022–12–31 00:00:01']))
  • Output: 2022–12–31 00:00:01
answered Nov 1, 2022 at 19:43

1 Comment

"Change '2022–12–31 00:00:00' to '2022–12–31 00:00:01'" - What does that have to do with the question?
-2
print(df1.shape)
(638765, 95)
%timeit df1['Datetime'] = pd.to_datetime(df1['Date']+" "+df1['HOUR'])
473 ms ± 8.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df1['Datetime'] = pd.to_datetime(df1['Date']+" "+df1['HOUR'], format='mixed')
688 ms ± 3.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df1['Datetime'] = pd.to_datetime(df1['Date']+" "+df1['HOUR'], format='%Y-%m-%d %H:%M:%S')
470 ms ± 7.31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
answered Oct 24, 2023 at 0:10

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.