Time Coding#

This page gives an overview how xarray encodes and decodes times and which conventions and functions are used.

Pandas functionality#

to_datetime#

The function pandas.to_datetime() is used within xarray for inferring units and for testing purposes.

In normal operation pandas.to_datetime() returns a pandas.Timestamp (for scalar input) or pandas.DatetimeIndex (for array-like input) which are related to np.datetime64 values with a resolution inherited from the input (can be one of 's', 'ms', 'us', 'ns'). If no resolution can be inherited 'ns' is assumed. That has the implication that the maximum usable time range for those cases is approximately +/- 292 years centered around the Unix epoch (1970年01月01日). To accommodate that, we carefully check the units/resolution in the encoding and decoding step.

When the arguments are numeric (not strings or np.datetime64 values) "unit" can be anything from 'Y', 'W', 'D', 'h', 'm', 's', 'ms', 'us' or 'ns', though the returned resolution will be "ns".

print(f"Minimum datetime: {pd.to_datetime(int64_min,unit="ns")}")
print(f"Maximum datetime: {pd.to_datetime(int64_max,unit="ns")}")

Minimum datetime: 1677年09月21日 00:12:43.145224193
Maximum datetime: 2262年04月11日 23:47:16.854775807

For input values which can’t be represented in nanosecond resolution an pandas.OutOfBoundsDatetime exception is raised:

try:
 dtime = pd.to_datetime(int64_max, unit="us")
except Exception as err:
 print(err)

try:
 dtime = pd.to_datetime(uint64_max, unit="ns")
 print("Wrong:", dtime)
 dtime = pd.to_datetime([uint64_max], unit="ns")
except Exception as err:
 print(err)

Wrong: 1969年12月31日 23:59:59.999999999
cannot convert input 18446744073709551615 with the unit 'ns'

np.datetime64 values can be extracted with pandas.Timestamp.to_numpy() and pandas.DatetimeIndex.to_numpy(). The returned resolution depends on the internal representation. This representation can be changed using pandas.Timestamp.as_unit() and pandas.DatetimeIndex.as_unit() respectively.

as_unit takes one of 's', 'ms', 'us', 'ns' as an argument. That means we are able to represent datetimes with second, millisecond, microsecond or nanosecond resolution.

time = pd.to_datetime(np.datetime64(0, "D"))
print("Datetime:", time, np.asarray([time.to_numpy()]).dtype)
print("Datetime as_unit('ms'):", time.as_unit("ms"))
print("Datetime to_numpy():", time.as_unit("ms").to_numpy())

Datetime: 1970年01月01日 00:00:00 datetime64[s]
Datetime as_unit('ms'): 1970年01月01日 00:00:00
Datetime to_numpy(): 1970年01月01日T00:00:00.000

time = pd.to_datetime(np.array([-1000, 1, 2], dtype="datetime64[Y]"))
print("DatetimeIndex:", time)
print("DatetimeIndex as_unit('us'):", time.as_unit("us"))
print("DatetimeIndex to_numpy():", time.as_unit("us").to_numpy())

DatetimeIndex: DatetimeIndex(['970-01-01', '1971年01月01日', '1972年01月01日'], dtype='datetime64[s]', freq=None)
DatetimeIndex as_unit('us'): DatetimeIndex(['970-01-01', '1971年01月01日', '1972年01月01日'], dtype='datetime64[us]', freq=None)
DatetimeIndex to_numpy(): ['0970年01月01日T00:00:00.000000' '1971年01月01日T00:00:00.000000'
 '1972年01月01日T00:00:00.000000']

Warning

Input data with resolution higher than 'ns' (eg. 'ps', 'fs', 'as') is truncated (not rounded) at the 'ns'-level. This is currently broken for the 'ps' input, where it is interpreted as 'ns'.

print("Good:", pd.to_datetime([np.datetime64(1901901901901, "as")]))
print("Good:", pd.to_datetime([np.datetime64(1901901901901, "fs")]))
print(" Bad:", pd.to_datetime([np.datetime64(1901901901901, "ps")]))
print("Good:", pd.to_datetime([np.datetime64(1901901901901, "ns")]))
print("Good:", pd.to_datetime([np.datetime64(1901901901901, "us")]))
print("Good:", pd.to_datetime([np.datetime64(1901901901901, "ms")]))

Good: DatetimeIndex(['1970年01月01日 00:00:00.000001901'], dtype='datetime64[ns]', freq=None)
Good: DatetimeIndex(['1970年01月01日 00:00:00.001901901'], dtype='datetime64[ns]', freq=None)
 Bad: DatetimeIndex(['1970年01月01日 00:00:01.901901901'], dtype='datetime64[ns]', freq=None)
Good: DatetimeIndex(['1970年01月01日 00:31:41.901901901'], dtype='datetime64[ns]', freq=None)
Good: DatetimeIndex(['1970年01月23日 00:18:21.901901'], dtype='datetime64[us]', freq=None)
Good: DatetimeIndex(['2030年04月08日 18:05:01.901000'], dtype='datetime64[ms]', freq=None)

Warning

Care has to be taken, as some configurations of input data will raise. The following shows, that we are safe to use pandas.to_datetime() when providing numpy.datetime64 as scalar or numpy array as input.

print(
 "Works:",
 np.datetime64(1901901901901, "s"),
 pd.to_datetime(np.datetime64(1901901901901, "s")),
)
print(
 "Works:",
 np.array([np.datetime64(1901901901901, "s")]),
 pd.to_datetime(np.array([np.datetime64(1901901901901, "s")])),
)
try:
 pd.to_datetime([np.datetime64(1901901901901, "s")])
except Exception as err:
 print("Raises:", err)
try:
 pd.to_datetime(1901901901901, unit="s")
except Exception as err:
 print("Raises:", err)
try:
 pd.to_datetime([1901901901901], unit="s")
except Exception as err:
 print("Raises:", err)
try:
 pd.to_datetime(np.array([1901901901901]), unit="s")
except Exception as err:
 print("Raises:", err)

Works: 62238-11-15T11:51:41 62238-11-15 11:51:41
Works: ['62238-11-15T11:51:41'] DatetimeIndex(['62238-11-15 11:51:41'], dtype='datetime64[s]', freq=None)

to_timedelta#

The function pandas.to_timedelta() is used within xarray for inferring units and for testing purposes.

In normal operation pandas.to_timedelta() returns a pandas.Timedelta (for scalar input) or pandas.TimedeltaIndex (for array-like input) which are np.timedelta64 values with ns resolution internally. That has the implication, that the usable timedelta covers only roughly 585 years. To accommodate for that, we are working around that limitation in the encoding and decoding step.

f"Maximum timedelta range: ({pd.to_timedelta(int64_min,unit="ns")}, {pd.to_timedelta(int64_max,unit="ns")})"

'Maximum timedelta range: (-106752 days +00:12:43.145224193, 106751 days 23:47:16.854775807)'

For input values which can’t be represented in nanosecond resolution an pandas.OutOfBoundsTimedelta exception is raised:

try:
 delta = pd.to_timedelta(int64_max, unit="us")
except Exception as err:
 print("First:", err)

try:
 delta = pd.to_timedelta(uint64_max, unit="ns")
except Exception as err:
 print("Second:", err)

Second: Cannot cast 18446744073709551615 from ns to 'ns' without overflow.

When arguments are numeric (not strings or np.timedelta64 values) "unit" can be anything from 'W', 'D', 'h', 'm', 's', 'ms', 'us' or 'ns', though the returned resolution will be "ns".

np.timedelta64 values can be extracted with pandas.Timedelta.to_numpy() and pandas.TimedeltaIndex.to_numpy(). The returned resolution depends on the internal representation. This representation can be changed using pandas.Timedelta.as_unit() and pandas.TimedeltaIndex.as_unit() respectively.

as_unit takes one of 's', 'ms', 'us', 'ns' as an argument. That means we are able to represent timedeltas with second, millisecond, microsecond or nanosecond resolution.

delta = pd.to_timedelta(np.timedelta64(1, "D"))
print("Timedelta:", delta, np.asarray([delta.to_numpy()]).dtype)
print("Timedelta as_unit('ms'):", delta.as_unit("ms"))
print("Timedelta to_numpy():", delta.as_unit("ms").to_numpy())

Timedelta: 1 days 00:00:00 timedelta64[s]
Timedelta as_unit('ms'): 1 days 00:00:00
Timedelta to_numpy(): 86400000 milliseconds

delta = pd.to_timedelta([0, 1, 2], unit="D")
print("TimedeltaIndex:", delta)
print("TimedeltaIndex as_unit('ms'):", delta.as_unit("ms"))
print("TimedeltaIndex to_numpy():", delta.as_unit("ms").to_numpy())

TimedeltaIndex: TimedeltaIndex(['0 days', '1 days', '2 days'], dtype='timedelta64[s]', freq=None)
TimedeltaIndex as_unit('ms'): TimedeltaIndex(['0 days', '1 days', '2 days'], dtype='timedelta64[ms]', freq=None)
TimedeltaIndex to_numpy(): [ 0 86400000 172800000]

Warning

Care has to be taken, as some configurations of input data will raise. The following shows, that we are safe to use pandas.to_timedelta() when providing numpy.timedelta64 as scalar or numpy array as input.

print(
 "Works:",
 np.timedelta64(1901901901901, "s"),
 pd.to_timedelta(np.timedelta64(1901901901901, "s")),
)
print(
 "Works:",
 np.array([np.timedelta64(1901901901901, "s")]),
 pd.to_timedelta(np.array([np.timedelta64(1901901901901, "s")])),
)
try:
 pd.to_timedelta([np.timedelta64(1901901901901, "s")])
except Exception as err:
 print("Raises:", err)
try:
 pd.to_timedelta(1901901901901, unit="s")
except Exception as err:
 print("Raises:", err)
try:
 pd.to_timedelta([1901901901901], unit="s")
except Exception as err:
 print("Raises:", err)
try:
 pd.to_timedelta(np.array([1901901901901]), unit="s")
except Exception as err:
 print("Raises:", err)

Works: 1901901901901 seconds 22012753 days 11:51:41
Works: [1901901901901] TimedeltaIndex(['22012753 days 11:51:41'], dtype='timedelta64[s]', freq=None)

Timestamp#

pandas.Timestamp is used within xarray to wrap strings of CF encoding reference times and datetime.datetime.

When arguments are numeric (not strings) "unit" can be anything from 'Y', 'W', 'D', 'h', 'm', 's', 'ms', 'us' or 'ns', though the returned resolution will be "ns".

In normal operation pandas.Timestamp holds the timestamp in the provided resolution, but only one of 's', 'ms', 'us', 'ns'. Lower resolution input is automatically converted to 's', higher resolution input is truncated to 'ns'.

The same conversion rules apply here as for pandas.to_timedelta() (see to_timedelta). Depending on the internal resolution Timestamps can be represented in the range:

for unit in ["s", "ms", "us", "ns"]:
 print(
 f"unit: {unit!r} time range ({pd.Timestamp(int64_min,unit=unit)}, {pd.Timestamp(int64_max,unit=unit)})"
 )

unit: 's' time range (-292277022657-01-27 08:29:53, 292277026596-12-04 15:30:07)
unit: 'ms' time range (-292275055-05-16 16:47:04.193000, 292278994-08-17 07:12:55.807000)
unit: 'us' time range (-290308-12-21 19:59:05.224193, 294247-01-10 04:00:54.775807)
unit: 'ns' time range (1677年09月21日 00:12:43.145224193, 2262年04月11日 23:47:16.854775807)

Since relaxing the resolution, this enhances the range to several hundreds of thousands of centuries with microsecond representation. NaT will be at np.iinfo("int64").min for all of the different representations.

Warning

When initialized with a datetime string this is only defined from -9999年01月01日 to 9999年12月31日.

try:
 print("Works:", pd.Timestamp("-9999年01月01日 00:00:00"))
 print("Works, too:", pd.Timestamp("9999年12月31日 23:59:59"))
 print(pd.Timestamp("10000-01-01 00:00:00"))
except Exception as err:
 print("Errors:", err)

Works: -9999年01月01日 00:00:00
Works, too: 9999年12月31日 23:59:59
Errors: year must be in 1..9999, not 10000: 10000-01-01 00:00:00

Note

pandas.Timestamp is the only current possibility to correctly import time reference strings. It handles non-ISO formatted strings, keeps the resolution of the strings ('s', 'ms' etc.) and imports time zones. When initialized with numpy.datetime64 instead of a string it even overcomes the above limitation of the possible time range.

try:
 print("Handles non-ISO:", pd.Timestamp("92-1-8 151542"))
 print(
 "Keeps resolution 1:",
 pd.Timestamp("1992年10月08日 15:15:42"),
 pd.Timestamp("1992年10月08日 15:15:42").unit,
 )
 print(
 "Keeps resolution 2:",
 pd.Timestamp("1992年10月08日 15:15:42.5"),
 pd.Timestamp("1992年10月08日 15:15:42.5").unit,
 )
 print(
 "Keeps timezone:",
 pd.Timestamp("1992年10月08日 15:15:42.5 -6:00"),
 pd.Timestamp("1992年10月08日 15:15:42.5 -6:00").unit,
 )
 print(
 "Extends timerange :",
 pd.Timestamp(np.datetime64("-10000-10-08 15:15:42.5001")),
 pd.Timestamp(np.datetime64("-10000-10-08 15:15:42.5001")).unit,
 )
except Exception as err:
 print("Errors:", err)

Handles non-ISO: 1992年01月08日 15:15:42
Keeps resolution 1: 1992年10月08日 15:15:42 us
Keeps resolution 2: 1992年10月08日 15:15:42.500000 us
Keeps timezone: 1992年10月08日 15:15:42.500000-06:00 us
Extends timerange : -10000-10-08 15:15:42.500100 us

DatetimeIndex#

pandas.DatetimeIndex is used to wrap np.datetime64 values or other datetime-likes when encoding. The resolution of the DatetimeIndex depends on the input, but can be only one of 's', 'ms', 'us', 'ns'. Lower resolution input is automatically converted to 's', higher resolution input is cut to 'ns'. pandas.DatetimeIndex will raise pandas.OutOfBoundsDatetime if the input can’t be represented in the given resolution.

try:
 print(
 "Works:",
 pd.DatetimeIndex(
 np.array(["1992年01月08日", "1992年01月09日"], dtype="datetime64[D]")
 ),
 )
 print(
 "Works:",
 pd.DatetimeIndex(
 np.array(
 ["1992年01月08日 15:15:42", "1992年01月09日 15:15:42"],
 dtype="datetime64[s]",
 )
 ),
 )
 print(
 "Works:",
 pd.DatetimeIndex(
 np.array(
 ["1992年01月08日 15:15:42.5", "1992年01月09日 15:15:42.0"],
 dtype="datetime64[ms]",
 )
 ),
 )
 print(
 "Works:",
 pd.DatetimeIndex(
 np.array(
 ["1970年01月01日 00:00:00.401501601701801901", "1970年01月01日 00:00:00"],
 dtype="datetime64[as]",
 )
 ),
 )
 print(
 "Works:",
 pd.DatetimeIndex(
 np.array(
 ["-10000-01-01 00:00:00.401501", "1970年01月01日 00:00:00"],
 dtype="datetime64[us]",
 )
 ),
 )
except Exception as err:
 print("Errors:", err)

Works: DatetimeIndex(['1992年01月08日', '1992年01月09日'], dtype='datetime64[s]', freq=None)
Works: DatetimeIndex(['1992年01月08日 15:15:42', '1992年01月09日 15:15:42'], dtype='datetime64[s]', freq=None)
Works: DatetimeIndex(['1992年01月08日 15:15:42.500000', '1992年01月09日 15:15:42'], dtype='datetime64[ms]', freq=None)
Works: DatetimeIndex(['1970年01月01日 00:00:00.401501601', '1970年01月01日 00:00:00'], dtype='datetime64[ns]', freq=None)
Works: DatetimeIndex(['-10000-01-01 00:00:00.401501', '1970年01月01日 00:00:00'], dtype='datetime64[us]', freq=None)

Time Coding#

Pandas functionality#

to_datetime#

to_timedelta#

Timestamp#

DatetimeIndex#

CF Conventions Time Handling#

CF time decoding#

CF time encoding#

Default Time Unit#