Introduction
I'm using Advent of Code 2018 to learn Python better, interested in the new type support for Python 3.7, I decided to go with that.
Here is my solution to Advent of Code Day 4, both part 1 and 2. I'm returning the answers for both part 1 and 2 as a tuple.
Problem Description
The problem essentially boils down to: Given a timestamped unsorted list of events of guards beginning their shift, waking up and falling asleep, determine the following:
Part 1: Which guard is asleep the most and on which minute is that guard mostly asleep? Return guard id multiplied by the minute number.
Part 2: Which guard is most frequently asleep on the same minute? Again, return guard id multiplied by the minute number.
For full problem description, please see Advent of Code Day 4
Concerns
I'm a big fan of Java 8 Stream API and C# Linq, I kind of expected Python to be more like that. I'm not sure if the nested function calls like sorted(list(...))
or len(list(...))
are "Pythonic". Likewise, it feels like I should be able to use some reducer-like function calls instead of imperatively looping through stuff to find the most common sleeper of some kind. Or is the way I have written this code the Python way to do it?
Code
from dataclasses import dataclass
from datetime import datetime
from days import read_file
from enum import Enum
from collections import defaultdict, namedtuple
from statistics import mode
import statistics
import operator
import re
class EventType(Enum):
STARTS_SHIFT = 1
FALLS_ASLEEP = 2
WAKES_UP = 3
@dataclass
class Event:
time: datetime
guard: int
event: EventType
@dataclass
class GuardSleep:
sleep_total: int
last_sleep: int
sleeps: list
def add_sleeps(self, minute):
for i in range(self.last_sleep, minute):
self.sleeps.append(i)
def get_guard(line: str):
if "Guard" in line:
guard_id = re.search("Guard #(\\d+)", line)
return int(guard_id.group(1))
return -1
def event_type(line):
if "begins shift" in line:
return EventType.STARTS_SHIFT
if "falls asleep" in line:
return EventType.FALLS_ASLEEP
if "wakes up" in line:
return EventType.WAKES_UP
raise Exception("Unknown line: " + line)
def day4() -> (int, int):
events = sorted(list(Event(datetime.strptime(line[1:17], "%Y-%m-%d %H:%M"), get_guard(line), event_type(line)) for line in read_file(4)), key=operator.attrgetter("time"))
guard = -1
guardsleep = defaultdict(lambda: GuardSleep(0, 0, []))
for event in events:
if event.guard >= 0:
guard = event.guard
if event.event == EventType.FALLS_ASLEEP:
guardsleep[guard].last_sleep = event.time.minute
if event.event == EventType.WAKES_UP:
guardsleep[guard].sleep_total += event.time.minute - guardsleep[guard].last_sleep
guardsleep[guard].add_sleeps(event.time.minute)
most_sleepy_guard_number = max(guardsleep, key=(lambda key: guardsleep[key].sleep_total))
most_sleepy_guard = guardsleep[most_sleepy_guard_number]
part1_result = most_sleepy_guard_number * mode(sorted(most_sleepy_guard.sleeps))
# Part 2
MostSleepy = namedtuple('MostCommon', ['id', 'minute', 'amount'])
most_sleepy = MostSleepy(0, 0, 0)
for k in guardsleep:
current_guard = guardsleep[k]
try:
most_common_minute = mode(sorted(current_guard.sleeps))
amount = len(list((m for m in current_guard.sleeps if m == most_common_minute)))
if amount > most_sleepy.amount:
most_sleepy = MostSleepy(k, most_common_minute, amount)
except statistics.StatisticsError:
print("No unique most common minute for " + str(k))
return part1_result, most_sleepy.id * most_sleepy.minute
if __name__ == '__main__':
print(day4())
2 Answers 2
Replacing chained if
with dictionary lookup
This function:
def event_type(line):
if "begins shift" in line:
return EventType.STARTS_SHIFT
if "falls asleep" in line:
return EventType.FALLS_ASLEEP
if "wakes up" in line:
return EventType.WAKES_UP
raise Exception("Unknown line: " + line)
isn't bad, but chained if
like this smell. It may be better represented as a dictionary, where the key is the string above, the value is the enum value, and you do a simple key lookup based on the last two words of every line. Whereas chained if
is worst-case O(n), dictionary lookup is O(1). Then - no if
s needed, and you get the exception for free if key lookup fails.
Use raw strings
re.search("Guard #(\\d+)", line)
should be
re.search(r"Guard #(\d+)", line)
Settle down with the one-liners
This:
events = sorted(list(Event(datetime.strptime(line[1:17], "%Y-%m-%d %H:%M"), get_guard(line), event_type(line)) for line in read_file(4)), key=operator.attrgetter("time"))
is effectively illegible. Break this up into multiple lines - including a temporary variable for the strptime
, as well as linebreaks in the list comprehension itself.
Don't use lists if you can use tuples
This:
MostSleepy = namedtuple('MostCommon', ['id', 'minute', 'amount'])
should be
MostSleepy = namedtuple('MostCommon', ('id', 'minute', 'amount'))
for various reasons - tuples are immutable, so use them for immutable data; and under certain narrow contexts (certainly not this one) they're faster.
Use a sum instead of a list constructor
This:
amount = len(list((m for m in current_guard.sleeps if m == most_common_minute)))
should be
amount = sum(1 for m in current_guard.sleeps if m == most_common_minute)
(Also, even if you kept using len
, you should use a tuple
constructor instead of a list
constructor.)
Another footnote - don't put inner parens in expressions like list((...generator...))
. Constructors can accept generator expressions directly.
-
1\$\begingroup\$ I'm not sure I follow your logic on the tuple vs list in the call to
namedtuple
. The second argument (field names) can be a list, tuple, string, etc. I think it's largely irrelevant which you choose. Also while technically I suppose it's true that worst-case chained if's is O(n), so is dictionary lookup if you have lots of collisions. I agree that in some instances dictionary lookup can stand-in for if branches, but I think that is somewhat a preference or dictated by the number of branches you have (e.g. 100 diff conditionals would be a sign of a problem). \$\endgroup\$Solaxun– Solaxun2019年01月07日 20:51:14 +00:00Commented Jan 7, 2019 at 20:51 -
1\$\begingroup\$ I would however suggest that he use "elif" instead of multiple "if" statements. In this scenario, the "if" statements are mutually exclusive, so it doesn't make a difference, but the conveyed intent could be misleading in other circumstances. If you want only one branch to execute, as opposed to each time there is a match, use "elif". \$\endgroup\$Solaxun– Solaxun2019年01月07日 20:57:01 +00:00Commented Jan 7, 2019 at 20:57
You could eliminate some of the many dependencies:
- Since you're already using
@dataclass
, you could use it forMostSleepy
instead ofnamedtuple
- It looks strange to
import statistics
afterfrom statistics import mode
. Aside frommode
, the only other thing used from it isStatisticsError
. So you could usefrom statistics import mode, StatisticsError
and not import the entirestatistics
- I don't see the
enum
doing anything useful. You could remove it and the program will still work. - The
operator
is not very useful either. You could replaceoperator.attrgetter("time")
withlambda t: t.time
The add_sleeps
function could be written more compactly:
def add_sleeps(self, minute):
self.sleeps.extend(list(range(self.last_sleep, minute)))
When creating the events
list,
you used helper functions get_guard
and event_type
.
It would have been good to do the same for the time too.
The last_sleep
attribute doesn't belong in GuardSleep
.
It's an implementation detail of the parsing of the lines,
it has no other use for a GuardSleep
instance.
Instead of string concatenation like "foo " + str(bar)
,
the recommended way is f-strings, f"foo {bar}"
.
The input would have allowed some simplifications. For example, alphabetic sorting of the lines gives the same results as sorting by time. And, it seems all the "falls asleep" and "wakes up" events happen in the 0th hour. As such, you could just parse the minute instead of the entire time:
events = [Event(int(line[15:17]), get_guard(line), event_type(line)) for line in sorted(read_file(4))]
Explore related questions
See similar questions with these tags.