Problem Description
The problem was initially posted here: Parse and clean up text block of store hours in Python.
Given the following multi-line string:
"""Hours
Monday 9:30 AM - 9:00 PM
Tuesday 9:30 AM - 9:00 PM
Wednesday 9:30 AM - 9:00 PM
Thursday 9:30 AM - 9:00 PM
Friday 9:30 AM - 11:00 PM
Saturday 9:30 AM - 11:00 PM
Sunday 11:00 AM - 6:00 PM
Holiday Hours
Thanksgiving Day 11:00 AM - 6:00 PM"""
Group the consecutive weekdays by the work hours, producing the following output:
"""Mon-Thu 9:30AM-9:00PM
Fri-Sat 9:30AM-11:00PM
Sun & Hol 11:00AM-6:00PM"""
Solution
My approach is based on parsing the weekday lines, splitting by the first space and then using itertools.groupby()
to sort by the second item of every row, using the groupby
's implementation detail - it would only group the consecutive matches together:
from itertools import groupby
from operator import itemgetter
data = """Hours
Monday 9:30 AM - 9:00 PM
Tuesday 9:30 AM - 9:00 PM
Wednesday 9:30 AM - 9:00 PM
Thursday 9:30 AM - 9:00 PM
Friday 9:30 AM - 11:00 PM
Saturday 9:30 AM - 11:00 PM
Sunday 11:00 AM - 6:00 PM
Holiday Hours
Thanksgiving Day 11:00 AM - 6:00 PM"""
# filter relevant rows with weekdays only
rows = [row.split(" ", 1) for row in data.splitlines()[1:-2]]
# group consecutive days by a time range
result = []
for time_range, group in groupby(rows, key=itemgetter(1)):
days_in_group = [item[0] for item in group]
first_day, last_day = days_in_group[0][:3], days_in_group[-1][:3]
range_end = "-" + str(last_day) if first_day != last_day else ""
result.append("{begin}{end} {time_range}".format(begin=first_day,
end=range_end,
time_range=time_range))
print("\n".join(result))
Is this the most optimal solution to the problem? Is it Pythonic? What would you improve code quality-wise?
2 Answers 2
Obtaining & Hol
in the output is quite difficult to achieve programmatically in any reasonable way, without hard-coding special cases. I see that you have not attempted to satisfy that part of the request. So, the specification is open to interpretation. I would say that a reasonable approach is to try to parse the opening hours if the line looks like a day of the week, and pass through the output if the line contains anything else.
For the parsing, I would use regular expressions. If you use named capture groups, you can avoid mysterious expressions like .split()
, itemgetter(1)
, and days_in_group[0][:3]
.
Instead of .append()
, I'd write it as a generator function.
from itertools import groupby
from operator import itemgetter
import re
def group_by_hours(s):
REGEX = re.compile(r'^(?P<day_abbrev>\w{3})\w*day (?P<hours>.* - .*)|'
r'^(?P<misc>.*)', re.MULTILINE)
for hours, lines in groupby(REGEX.finditer(s), itemgetter('hours')):
lines = list(lines)
if not hours:
for line in lines:
yield line['misc']
elif len(lines) == 1:
yield "{0} {1}".format(lines[0]['day_abbrev'], hours)
elif len(lines) > 1:
yield "{0}-{1} {2}".format(lines[0]['day_abbrev'],
lines[-1]['day_abbrev'],
hours)
print('\n'.join(group_by_hours(data)))
-
\$\begingroup\$ I particularly like using the named groups for both the days and abbreviated days, switching to a generator and, of course, overall, handling the other parts of the data string. Thanks, learned new ways of thinking about the code. \$\endgroup\$alecxe– alecxe2017年09月09日 01:43:16 +00:00Commented Sep 9, 2017 at 1:43
magic numbers
In
rows = [row.split(" ", 1) for row in data.splitlines()[1:-2]]
the 1 & -2 magic numbers are a bit of a wart. Perhaps you'd like to create a generator that knows to suppress 'Hours' and that won't yield any further lines after 'Holiday Hours'.
Also,
first_day, last_day = days_in_group[0][:3], days_in_group[-1][:3]
has some magic numbers, but arguably they make more sense. You could make your generator responsible for day parsing, with yield day[:3], time_range
.
variable renaming
In your string formatting, you choose to rename first_day to begin, and range_end to end. I suppose my advice is "don't do that". Or if there's a name you prefer for those variables, use that consistently throughout.
In
days_in_group = [item[0] for item in group]
the identifier item
is a bit vague. pair
would be better (or perhaps day_range
if you lean toward the verbose).
global scope
You polluted the top-level namespace with rows
and other identifiers. Your file cannot be reused via import
. Please use the
if __name__ == '__main__':
idiom to bury these lines of code under def main():
or similar.
Overall, yes, this is sensible code, put together in pythonic fashion.
& Hol
part of the output is difficult to infer programmatically, and in fact, you haven't done it. \$\endgroup\$