I developed a function that, from a given sequence of digits, extracts the date and reformat it.
This is the code:
from datetime import datetime as dt
def format_dates(field):
n = len(field)
match = False
i = 0
while match is False:
try:
# Take the last four digits
year = int(field[-4 - i:n - i])
except ValueError:
return ''
# Check if this year is between today's year +/- (15, 100)
if (1919 <= year <= 2019):
# Check if there are other 4 digits before these 4 ones
if (len(field[-8 - i:n - i]) == 8):
try:
f_date = dt.strptime(field[-8 - i:n - i],
'%d%m%Y').strftime('%d/%m/%Y')
match = True
return f_date
except ValueError:
pass
else:
return ''
i += 1
Explanation:
This function:
Takes a sequence of digits as input.
extracts the last four digits from that sequence.
Checks if the extracted four digits are between 2019 and 1919, if not, it breaks.
If yes, it checks if there are more 4 digits before the previously extracted ones, if not it breaks.
If yes, it tries to format the whole 8 digits.
- If there is a ValueError exception, it passes (ValueError, means there are 8 digits, the last four of them represent a correct year, but the fist four digits are wrong. So it passes to increment i + 1 to add a the next digits in the front and remove the last digit in the processed sequence).
Example:
input: '1303201946'
Iteration 1:
- i = 0, match = False
- year = 1946
- test 1 (year between 2019 and 1919): passes.
- test2 (there are 4 other digits before 1946, which are 0320): passes.
- format the whole 8 digits: ValueError exception, so i = i+1 and pass to the next iteration.
- Iteration 2:
- i = 1, match = False
- year = 0194
- test 1 (year between 2019 and 1919): fails, so i = i + 1 and pass to the next iteration.
- Iteration 3:
- i = 2, match = False
- year = 2019
- test 1: passes
- test 2: passes
- format the whole 8 digits (13032019): 13/03/2019 (No ValueError exception) passes
- match = True, return the formatted date, break from the while loop.
This function works fine, but the way it handles the errors seems ugly. Also I believe it is not optimized (same exceptions are repeated, a lot of returns and the code does not seem elegant).
How to reformat the code and make it more optimized?
1 Answer 1
Exception
If your algorithm cannot find a date, it is easier to raise an Exception than to return ''
. Returning sentinel values instead of exceptions can lead to unexpected behaviour if the user of this function does not test for this sentinel value.
comments
Comments should explain why you did something, not how. # Take the last four digits
tells you nothing more than the code itself. I would rather comment at field[-4 - i:n - i]
why you did n - i
instead of just -i
.
nesting
Instead of nesting a number of if-clauses, it can be better to test the negative of the condition, and continue
, so the rest of the code is less nested.
match
Don't test condition is True
. Just do condition
. In Python a lot of values can act as True
or False
in tests.
Your match
is never used anyway; the moment you set it to True
, you also return the result, so a while True:
would have sufficed here.
field
This is a very unclear variable name. This method excepts a date in string format, so why not call the argument like that?
Return type
Your code does 2 things now. It looks for a date in the string, and converts that date to another format. It would be better to separate those 2 things, and return a datetime.datetime
instead, and let the caller of this method worry about formatting that correctly.
while True
You use a while True
-loop, with an incrementing counter. A better way to do this would be to either use for i in range(...)
or using itertools.count
: for i in itertools.count()
. In this case you know there will be no more than len(field) - 7
iterations, so you might as well use that.
Revert the algorithm
You explicitly test whether the substring is 8 characters long, and then if it is in the right format. By changing the while True
to the for
-loop, you know the substring will be 8 characters long. Then it makes sense to first try to convert it to a datetime
, and then check whether the year is correct:
def format_dates2(date_string):
n = len(date_string)
for i in range(n - 7):
sub_string = date_string[-(8 + i) : n - i]
# not just -i because that fails at i==0
try:
date = dt.strptime(sub_string, "%d%m%Y")
except ValueError:
continue
if not (1919 <= date.year <= 2019):
continue
return date
raise ValueError("Date not in the correct format")
Explore related questions
See similar questions with these tags.
NameError: name 'sub_field' is not defined
\$\endgroup\$