I am trying to write a python function to convert an array of strings into an array of datetime
objects where string can be in one of two formats, and would like to know how the following code can be optimized:
import datetime
def con(data) -> list:
op = []
for i in data:
if len(i) <= 8:
try:
op.append(datetime.datetime.strptime(i, "%d/%m/%y"))
except:
pass
else:
try:
op.append(datetime.datetime.strptime(i, "%d/%m/%Y"))
except:
pass
return op
inputs = ['20/05/25', '20/05/25', '29/02/2000', '19/02/2034']
inputs = con(inputs)
print(inputs)
2 Answers 2
Divide and conquer
Your function does too much at once. Your core problem is parsing a date from a set of possible formats. Let's start there:
from contextlib import suppress
from datetime import datetime
from typing import Iterable
def try_parse_date(text: str, formats: Iterable[str]) -> datetime:
"""Attempt to parse a date from given formats."""
for frmt in formats:
with suppress(ValueError):
return datetime.strptime(text, frmt)
raise ValueError("Could not parse date from formats.")
This functions takes the to-be parsed string and a list of formats to try one after another. It will return the parsed datetime, once a datetime has been successfully parsed. If a datetime could not be parsed with any of the given formats, it will raise an appropriate exception.
Use standard library tools
Then we can apply this to our input. Since we want to parse all strings with the same formats, we can use functools.partial
to bind the formats of the function to a new function parser
and then call that function on all elements of the input
list within a list comprehension to create our output
list:
from contextlib import suppress
from datetime import datetime
from functools import partial
from typing import Iterable
def try_parse_date(text: str, formats: Iterable[str]) -> datetime:
"""Attempt to parse a date from given formats."""
for frmt in formats:
with suppress(ValueError):
return datetime.strptime(text, frmt)
raise ValueError("Could not parse date from formats.")
inputs = ['20/05/25', '20/05/25', '29/02/2000', '19/02/2034']
parser = partial(try_parse_date, formats=["%d/%m/%y", "%d/%m/%Y"])
outputs = [parser(s) for s in inputs]
print(outputs)
Note that I also renamed the output variable to have a descriptive name (it's not the input).
-
2\$\begingroup\$ I missed some nuance in my answer that I'll add later tonight, but in the meantime - I think the only thing missing from this answer is that your final
raise
should include afrom
with a reference to the last inner exception. \$\endgroup\$Reinderien– Reinderien2024年01月02日 15:07:48 +00:00Commented Jan 2, 2024 at 15:07 -
3\$\begingroup\$ @Reinderien: Should it? I would argue that all formats are equal here, we have no idea which would produce the better error message (ie, which is closest to the actual data). I would, however, be in favor of enriching the exception with both the failing value and the formats it was tried against. The current one lacks a lot of information. \$\endgroup\$Matthieu M.– Matthieu M.2024年01月03日 15:50:46 +00:00Commented Jan 3, 2024 at 15:50
-
\$\begingroup\$ @MatthieuM. That advice is coupled with "only call
strptime
once", since OP's actual use case is significantly less generic than this solution would lead us to believe. \$\endgroup\$Reinderien– Reinderien2024年01月03日 16:08:26 +00:00Commented Jan 3, 2024 at 16:08 -
1\$\begingroup\$ @Reinderien: Well, that's a very different solution then. If
strptime
is only called once, there's no point in suppressing the exception it throws and raising a custom one. For the solution at hand -- which doesn't "poke" and instead uses a list of formats -- then I would argue that usingfrom
with a single exception would be possibly misleading, and thus worse than not usingfrom
at all. \$\endgroup\$Matthieu M.– Matthieu M.2024年01月03日 16:12:31 +00:00Commented Jan 3, 2024 at 16:12 -
\$\begingroup\$ I feel like Exception Groups would be a good fit for this kind of generic approach...
excs = []; for ...: try: return datetime.... except ValueError as e: excs.append(e)
then, if nothing returnsraise ExceptionGroup('Could not parse date from formats', excs)
\$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2024年01月04日 14:17:12 +00:00Commented Jan 4, 2024 at 14:17
Don't rely on datetime parsing to throw a ValueError
to determine which format you follow. Among other reasons, this is problematic because it will also throw-and-suppress for complete nonsense that you should instead be noisy about. Even if you throw a generic exception after having exhausted all other formats, this is not ideal because the exception comes from your code and not the datetime
code. In the ideal case, whenever possible, it should be datetime
itself at the top of the exception trace. Trace quality matters quite a bit, and encouraging sane exception traces is a gift from your current self to your future debugging self.
Since there is no built-in optional-century year field, detect it explicitly during parse. In the ideal implementation, expend the minimum amount of effort to discriminate between year styles while leaving as much of the full parsing as possible to datetime
. (Your top-level len
is not reliable enough, I think, because it's influenced by varying month and day digit counts.)
Add (more) type hints to your function signature, and produce a date
, not a datetime
.
Produce an iterator instead of appending to a list.
Add unit tests.
import datetime
from typing import Iterable, Iterator
def convert_dates(data: Iterable[str]) -> Iterator[datetime.date]:
for string in data:
year = string.rsplit(sep='/', maxsplit=1)[-1]
if len(year) <= 2:
year_field = '%y'
else:
year_field = '%Y'
yield datetime.datetime.strptime(
string, '%d/%m/' + year_field,
).date()
def test() -> None:
inputs = ('20/05/25', '29/02/2000', '19/02/2034')
actual = tuple(convert_dates(inputs))
expected = (
datetime.date(2025, 5, 20),
datetime.date(2000, 2, 29),
datetime.date(2034, 2, 19),
)
assert actual == expected
try:
tuple(convert_dates(['nonsense']))
raise AssertionError('should have thrown')
except ValueError as e:
assert 'does not match format' in e.args[0]
if __name__ == '__main__':
test()
-
1\$\begingroup\$ While I don't agree with your assessment on using
ValueError
s to skip failing formats, I really like the regex approach. \$\endgroup\$Richard Neumann– Richard Neumann2024年01月02日 14:16:45 +00:00Commented Jan 2, 2024 at 14:16 -
6\$\begingroup\$ Thanks. I do recognize that ask-for-forgiveness-instead-of-permission is pretty typical in Python, but applied here it will hide more severe problems in the data. OP has not given any explicit indication as to where the data come from and what guarantees they have about being well-formed, so as diligent developers we should operate from a strict validation standpoint until we need otherwise. \$\endgroup\$Reinderien– Reinderien2024年01月02日 14:18:57 +00:00Commented Jan 2, 2024 at 14:18
except:
(that is, an exception handler that doesn't specify one or more exception classes), especially if the handler is justpass
. See Why is "except: pass" a bad programming practice? on Stack Overflow for more details. \$\endgroup\$