Speeding up Python date conversion function currently using list comprehension and datetime

Question 1

I am reading a large data file where the time is given in number of days since some epoch. I am currently converting this to Python's datetime format using this function:

import datetime as dt
def days2dt(days_since_epoch):
 epoch = dt.datetime(1980, 1, 6)
 datelist = [epoch + dt.timedelta(days=x) for x in days_since_epoch]
 return datelist
# run with sample data (might be larger in real life, in worst case multiply
# the list by 40 instead of 6)
import numpy as np
sample = list(np.arange(0, 3/24., 1/24./3600./50.))*6
dates = days2dt(sample)

Running this function takes 5x longer than reading the entire file using pandas.read_csv() (perhaps because the listcomp performs an addition for each element). The returned list is used immediately as the index of the pandas DataFrame, though interestingly, using a generator expression instead of a listcomp as above improves performance by ~35% (why?).

Aside from using a generator expression, can the performance of this function be improved in any way, e.g. by not performing this date conversion per-element or by using some NumPy feature I'm not aware of?

Question 2

Do you have example input (in size and distribution)?

Question 3

@Veedrac See the updated question. The data come from GPS satellites, let's say they are 50 Hz resolution and the satellite is visible for 3 hours 2 times a day. Then the example code I added would be 3 days of data from one satellite (the values would of course not be repeated like that in real life, but the example gives the correct amount of data as well as the fact that they are not entirely contiguous).

Question 4

You should try Numpy's datetime and timedelta support:

def days2dt(days_since_epoch):
 microseconds = np.around(np.asarray(days_since_epoch) * (24*60*60*10**6))
 return np.datetime64('1980-01-06') + microseconds.astype('timedelta64[us]')

I suggest you read the units section of the docs to make sure this is safe (both resolution and min/max dates), but it should be fine.

Note that 90% of the time taken for days2dt is converting the input list to a numpy.array; if you pass in a numpy.array it goes much faster. Nevertheless, this is significantly faster than the list comprehension already.

Question 5

That's amazing, thanks! I notice a ~30x increase in speed for a relatively large array using code based on your suggestion.

Veedrac Veedrac 9,77323 silver badges38 bronze badges · Accepted Answer · 2015-01-15 23:39:02Z

You should try Numpy's datetime and timedelta support:

def days2dt(days_since_epoch):
 microseconds = np.around(np.asarray(days_since_epoch) * (24*60*60*10**6))
 return np.datetime64('1980-01-06') + microseconds.astype('timedelta64[us]')

I suggest you read the units section of the docs to make sure this is safe (both resolution and min/max dates), but it should be fine.

Note that 90% of the time taken for days2dt is converting the input list to a numpy.array; if you pass in a numpy.array it goes much faster. Nevertheless, this is significantly faster than the list comprehension already.

That's amazing, thanks! I notice a ~30x increase in speed for a relatively large array using code based on your suggestion.

Stack Exchange Network

Speeding up Python date conversion function currently using list comprehension and datetime

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Speeding up Python date conversion function currently using list comprehension and datetime

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions