Interpret a date from a string of digits

Question 1

I developed a function that, from a given sequence of digits, extracts the date and reformat it.
This is the code:

from datetime import datetime as dt
def format_dates(field):
 n = len(field)
 match = False
 i = 0
 while match is False:
 try:
 # Take the last four digits
 year = int(field[-4 - i:n - i])
 except ValueError:
 return ''
 # Check if this year is between today's year +/- (15, 100)
 if (1919 <= year <= 2019):
 # Check if there are other 4 digits before these 4 ones
 if (len(field[-8 - i:n - i]) == 8):
 try:
 f_date = dt.strptime(field[-8 - i:n - i],
 '%d%m%Y').strftime('%d/%m/%Y')
 match = True
 return f_date
 except ValueError:
 pass
 else:
 return ''
 i += 1

Explanation:
This function:

Takes a sequence of digits as input.
extracts the last four digits from that sequence.
Checks if the extracted four digits are between 2019 and 1919, if not, it breaks.
If yes, it checks if there are more 4 digits before the previously extracted ones, if not it breaks.
If yes, it tries to format the whole 8 digits.
If there is a ValueError exception, it passes (ValueError, means there are 8 digits, the last four of them represent a correct year, but the fist four digits are wrong. So it passes to increment i + 1 to add a the next digits in the front and remove the last digit in the processed sequence).

Example:

input: '1303201946'

Iteration 1:
- i = 0, match = False
- year = 1946
- test 1 (year between 2019 and 1919): passes.
- test2 (there are 4 other digits before 1946, which are 0320): passes.
- format the whole 8 digits: ValueError exception, so i = i+1 and pass to the next iteration.
Iteration 2:
- i = 1, match = False
- year = 0194
- test 1 (year between 2019 and 1919): fails, so i = i + 1 and pass to the next iteration.
Iteration 3:
- i = 2, match = False
- year = 2019
- test 1: passes
- test 2: passes
- format the whole 8 digits (13032019): 13/03/2019 (No ValueError exception) passes
- match = True, return the formatted date, break from the while loop.

This function works fine, but the way it handles the errors seems ugly. Also I believe it is not optimized (same exceptions are repeated, a lot of returns and the code does not seem elegant).
How to reformat the code and make it more optimized?

Question 2

This gives a NameError: name 'sub_field' is not defined

Question 3

@MaartenFabré, sorry I fixed it, it should be field instead of sub_field

Question 4

@MaartenFabré, There was some errors in the code, I fixed them

Question 5

Exception

If your algorithm cannot find a date, it is easier to raise an Exception than to return ''. Returning sentinel values instead of exceptions can lead to unexpected behaviour if the user of this function does not test for this sentinel value.

comments

Comments should explain why you did something, not how. # Take the last four digits tells you nothing more than the code itself. I would rather comment at field[-4 - i:n - i] why you did n - i instead of just -i.

nesting

Instead of nesting a number of if-clauses, it can be better to test the negative of the condition, and continue, so the rest of the code is less nested.

match

Don't test condition is True. Just do condition. In Python a lot of values can act as True or False in tests.

Your match is never used anyway; the moment you set it to True, you also return the result, so a while True: would have sufficed here.

`field`

This is a very unclear variable name. This method excepts a date in string format, so why not call the argument like that?

Return type

Your code does 2 things now. It looks for a date in the string, and converts that date to another format. It would be better to separate those 2 things, and return a datetime.datetime instead, and let the caller of this method worry about formatting that correctly.

`while True`

You use a while True-loop, with an incrementing counter. A better way to do this would be to either use for i in range(...) or using itertools.count: for i in itertools.count(). In this case you know there will be no more than len(field) - 7 iterations, so you might as well use that.

Revert the algorithm

You explicitly test whether the substring is 8 characters long, and then if it is in the right format. By changing the while True to the for-loop, you know the substring will be 8 characters long. Then it makes sense to first try to convert it to a datetime, and then check whether the year is correct:

def format_dates2(date_string):
 n = len(date_string)
 for i in range(n - 7):
 sub_string = date_string[-(8 + i) : n - i]
 # not just -i because that fails at i==0
 try:
 date = dt.strptime(sub_string, "%d%m%Y")
 except ValueError:
 continue
 if not (1919 <= date.year <= 2019):
 continue
 return date
 raise ValueError("Date not in the correct format")

Maarten Fabré Maarten Fabré 9,3901 gold badge15 silver badges27 bronze badges · Accepted Answer · 2019-10-08 10:36:14Z

Exception

If your algorithm cannot find a date, it is easier to raise an Exception than to return ''. Returning sentinel values instead of exceptions can lead to unexpected behaviour if the user of this function does not test for this sentinel value.

comments

Comments should explain why you did something, not how. # Take the last four digits tells you nothing more than the code itself. I would rather comment at field[-4 - i:n - i] why you did n - i instead of just -i.

nesting

Instead of nesting a number of if-clauses, it can be better to test the negative of the condition, and continue, so the rest of the code is less nested.

match

Don't test condition is True. Just do condition. In Python a lot of values can act as True or False in tests.

Your match is never used anyway; the moment you set it to True, you also return the result, so a while True: would have sufficed here.

`field`

This is a very unclear variable name. This method excepts a date in string format, so why not call the argument like that?

Return type

Your code does 2 things now. It looks for a date in the string, and converts that date to another format. It would be better to separate those 2 things, and return a datetime.datetime instead, and let the caller of this method worry about formatting that correctly.

`while True`

You use a while True-loop, with an incrementing counter. A better way to do this would be to either use for i in range(...) or using itertools.count: for i in itertools.count(). In this case you know there will be no more than len(field) - 7 iterations, so you might as well use that.

Revert the algorithm

You explicitly test whether the substring is 8 characters long, and then if it is in the right format. By changing the while True to the for-loop, you know the substring will be 8 characters long. Then it makes sense to first try to convert it to a datetime, and then check whether the year is correct:

def format_dates2(date_string):
 n = len(date_string)
 for i in range(n - 7):
 sub_string = date_string[-(8 + i) : n - i]
 # not just -i because that fails at i==0
 try:
 date = dt.strptime(sub_string, "%d%m%Y")
 except ValueError:
 continue
 if not (1919 <= date.year <= 2019):
 continue
 return date
 raise ValueError("Date not in the correct format")

Stack Exchange Network

Interpret a date from a string of digits

1 Answer 1

Exception

comments

nesting

match

`field`

Return type

`while True`

Revert the algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Interpret a date from a string of digits

1 Answer 1

Exception

comments

nesting

match

field

Return type

while True

Revert the algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`field`

`while True`