I'd like to find the best way to format date to YYYY, YYYYMM or YYYYMMDD.
Here is what I've got so far:
# gets the position of a character in the format
def get_pos(format, letter, number):
it_number = 0
it_pos = 0
for char in format:
if char == letter:
it_number += 1
if it_number == number:
return(it_pos)
it_pos += 1
return(-1)
# loops through all the characters forming a number
def format_loop(string, old_format, letter):
new = ''
pos = -2
it = 1
while(pos != -1):
pos = get_pos(old_format, letter, it)
if(pos >= 0):
new += string[pos]
it += 1
return(new)
# format a date
def date_format(string, old_format, new_format, delim):
new = format_loop(string, old_format, 'Y')
if(new_format in 'MD'):
new += delim + format_loop(string, old_format, 'M')
if(new_format == 'D'):
new += delim + format_loop(string, old_format, 'D')
return(new)
The function's intake parameters are, in order, the date string to format, its current format, the desired format in the form of 'Y', 'M' or 'D' and an optional delimiter to put in between the numbers.
I'm open for suggestions on shortening and optimising the code as well as on coding style and variables/functions naming.
2 Answers 2
When parsing and manipulating dates (or times), you should use the datetime
module.
It supports both parsing strings with a given format and formatting datetime
objects with another given format. The only thing it doesn't support is adding a delimiter between numbers:
import datetime
def date_format(string, old_format, new_format):
return datetime.strptime(string, old_format).strftime(new_format)
Note that the formats have to comply with datetime
s format specifiers. In particular that means that instead of "YYYY"
, "YYYYMM"
and "YYYYMMDD"
you would have to use "%Y"
, "%Y%m"
and "%Y%m%d"
, respectively. These format specifiers are not arbitrary, they mostly resemble the ones supported by the corresponding C library (which has existed for quite some time).
So, unless you really need that delimiter, I would use this, because it is vastly simpler and a lot easier to understand (especially by someone else). Even if you need the delimiter, it is probably easier to manipulate the output of this function to include it.
For the first loop, there is a neat little function called enumerate()
which allows you to iterate through an object and access both the element and the iteration number.
You can also remove various parenthesis, for the return
ed value and also for the while
loop (although for big conditions it's handy to keep the parenthesis sometimes).
def get_pos(format, letter, number):
it_number = 0
# it_pos = 0 <--- not needed anymore because of the
for it_pos, char in enumerate(format): # <--- enumerate
if char == letter:
it_number += 1
if it_number == number:
return it_pos
# it_pos += 1 <--- same thing here
return -1
For the second function I would suggest to use while True
and break
so that you don't have to declare pos=-2
beforehand.
def format_loop(string, old_format, letter):
new = ''
# pos = -2 <--- replaced by a break
it = 1
while True:
pos = get_pos(old_format, letter, it)
if pos == -1:
break
elif pos >= 0:
new += string[pos]
it += 1
return new
Also, I don't know if there is an official recommendation on that, but I like to use very explicit return values for special cases, for example the string "not found"
instead of -1
. That makes the code more English than Python when you read if pos=="not found": break
and I rather like that.
-
\$\begingroup\$ Well first I have to state that
enumerate()
is too good to be true, gonna use it everywhere. I'm not sure for the second modification as it would bring an action in the loop possibly making it slower \$\endgroup\$Comte_Zero– Comte_Zero2018年11月26日 13:14:09 +00:00Commented Nov 26, 2018 at 13:14 -
\$\begingroup\$ Yes
enumerate
is really helpful. Just remember the formulation isfor <iteration count>, <element> in enumerate(<iterable>):
(sometimes I tend to reverse them). Regarding your second remark, I'll time it and come back to you. \$\endgroup\$Guimoute– Guimoute2018年11月26日 13:19:57 +00:00Commented Nov 26, 2018 at 13:19 -
1\$\begingroup\$ No difference for me (4.5µs to perform date_format). I would add a default delimiter
delim=''
just in case is lazy to specify it. \$\endgroup\$Guimoute– Guimoute2018年11月26日 13:29:48 +00:00Commented Nov 26, 2018 at 13:29