Advent of Code 2018 Day 4 - Pythonic Sleepy Guards

Question 1

Introduction

I'm using Advent of Code 2018 to learn Python better, interested in the new type support for Python 3.7, I decided to go with that.

Here is my solution to Advent of Code Day 4, both part 1 and 2. I'm returning the answers for both part 1 and 2 as a tuple.

Problem Description

The problem essentially boils down to: Given a timestamped unsorted list of events of guards beginning their shift, waking up and falling asleep, determine the following:

Part 1: Which guard is asleep the most and on which minute is that guard mostly asleep? Return guard id multiplied by the minute number.

Part 2: Which guard is most frequently asleep on the same minute? Again, return guard id multiplied by the minute number.

For full problem description, please see Advent of Code Day 4

Concerns

I'm a big fan of Java 8 Stream API and C# Linq, I kind of expected Python to be more like that. I'm not sure if the nested function calls like sorted(list(...)) or len(list(...)) are "Pythonic". Likewise, it feels like I should be able to use some reducer-like function calls instead of imperatively looping through stuff to find the most common sleeper of some kind. Or is the way I have written this code the Python way to do it?

Code

from dataclasses import dataclass
from datetime import datetime
from days import read_file
from enum import Enum
from collections import defaultdict, namedtuple
from statistics import mode
import statistics
import operator
import re
class EventType(Enum):
 STARTS_SHIFT = 1
 FALLS_ASLEEP = 2
 WAKES_UP = 3
@dataclass
class Event:
 time: datetime
 guard: int
 event: EventType
@dataclass
class GuardSleep:
 sleep_total: int
 last_sleep: int
 sleeps: list
 def add_sleeps(self, minute):
 for i in range(self.last_sleep, minute):
 self.sleeps.append(i)
def get_guard(line: str):
 if "Guard" in line:
 guard_id = re.search("Guard #(\\d+)", line)
 return int(guard_id.group(1))
 return -1
def event_type(line):
 if "begins shift" in line:
 return EventType.STARTS_SHIFT
 if "falls asleep" in line:
 return EventType.FALLS_ASLEEP
 if "wakes up" in line:
 return EventType.WAKES_UP
 raise Exception("Unknown line: " + line)
def day4() -> (int, int):
 events = sorted(list(Event(datetime.strptime(line[1:17], "%Y-%m-%d %H:%M"), get_guard(line), event_type(line)) for line in read_file(4)), key=operator.attrgetter("time"))
 guard = -1
 guardsleep = defaultdict(lambda: GuardSleep(0, 0, []))
 for event in events:
 if event.guard >= 0:
 guard = event.guard
 if event.event == EventType.FALLS_ASLEEP:
 guardsleep[guard].last_sleep = event.time.minute
 if event.event == EventType.WAKES_UP:
 guardsleep[guard].sleep_total += event.time.minute - guardsleep[guard].last_sleep
 guardsleep[guard].add_sleeps(event.time.minute)
 most_sleepy_guard_number = max(guardsleep, key=(lambda key: guardsleep[key].sleep_total))
 most_sleepy_guard = guardsleep[most_sleepy_guard_number]
 part1_result = most_sleepy_guard_number * mode(sorted(most_sleepy_guard.sleeps))
 # Part 2
 MostSleepy = namedtuple('MostCommon', ['id', 'minute', 'amount'])
 most_sleepy = MostSleepy(0, 0, 0)
 for k in guardsleep:
 current_guard = guardsleep[k]
 try:
 most_common_minute = mode(sorted(current_guard.sleeps))
 amount = len(list((m for m in current_guard.sleeps if m == most_common_minute)))
 if amount > most_sleepy.amount:
 most_sleepy = MostSleepy(k, most_common_minute, amount)
 except statistics.StatisticsError:
 print("No unique most common minute for " + str(k))
 return part1_result, most_sleepy.id * most_sleepy.minute
if __name__ == '__main__':
 print(day4())

Question 2

Replacing chained `if` with dictionary lookup

This function:

def event_type(line):
 if "begins shift" in line:
 return EventType.STARTS_SHIFT
 if "falls asleep" in line:
 return EventType.FALLS_ASLEEP
 if "wakes up" in line:
 return EventType.WAKES_UP
 raise Exception("Unknown line: " + line)

isn't bad, but chained if like this smell. It may be better represented as a dictionary, where the key is the string above, the value is the enum value, and you do a simple key lookup based on the last two words of every line. Whereas chained if is worst-case O(n), dictionary lookup is O(1). Then - no ifs needed, and you get the exception for free if key lookup fails.

Use raw strings

re.search("Guard #(\\d+)", line)

should be

re.search(r"Guard #(\d+)", line)

Settle down with the one-liners

This:

events = sorted(list(Event(datetime.strptime(line[1:17], "%Y-%m-%d %H:%M"), get_guard(line), event_type(line)) for line in read_file(4)), key=operator.attrgetter("time"))

is effectively illegible. Break this up into multiple lines - including a temporary variable for the strptime, as well as linebreaks in the list comprehension itself.

Don't use lists if you can use tuples

This:

MostSleepy = namedtuple('MostCommon', ['id', 'minute', 'amount'])

should be

MostSleepy = namedtuple('MostCommon', ('id', 'minute', 'amount'))

for various reasons - tuples are immutable, so use them for immutable data; and under certain narrow contexts (certainly not this one) they're faster.

Use a sum instead of a list constructor

This:

amount = len(list((m for m in current_guard.sleeps if m == most_common_minute)))

should be

amount = sum(1 for m in current_guard.sleeps if m == most_common_minute)

(Also, even if you kept using len, you should use a tuple constructor instead of a list constructor.)

Another footnote - don't put inner parens in expressions like list((...generator...)). Constructors can accept generator expressions directly.

Question 3

I'm not sure I follow your logic on the tuple vs list in the call to namedtuple. The second argument (field names) can be a list, tuple, string, etc. I think it's largely irrelevant which you choose. Also while technically I suppose it's true that worst-case chained if's is O(n), so is dictionary lookup if you have lots of collisions. I agree that in some instances dictionary lookup can stand-in for if branches, but I think that is somewhat a preference or dictated by the number of branches you have (e.g. 100 diff conditionals would be a sign of a problem).

Question 4

I would however suggest that he use "elif" instead of multiple "if" statements. In this scenario, the "if" statements are mutually exclusive, so it doesn't make a difference, but the conveyed intent could be misleading in other circumstances. If you want only one branch to execute, as opposed to each time there is a match, use "elif".

Question 5

You could eliminate some of the many dependencies:

Since you're already using @dataclass, you could use it for MostSleepy instead of namedtuple
It looks strange to import statistics after from statistics import mode. Aside from mode, the only other thing used from it is StatisticsError. So you could use from statistics import mode, StatisticsError and not import the entire statistics
I don't see the enum doing anything useful. You could remove it and the program will still work.
The operator is not very useful either. You could replace operator.attrgetter("time") with lambda t: t.time

The add_sleeps function could be written more compactly:

def add_sleeps(self, minute):
 self.sleeps.extend(list(range(self.last_sleep, minute)))

When creating the events list, you used helper functions get_guard and event_type. It would have been good to do the same for the time too.

The last_sleep attribute doesn't belong in GuardSleep. It's an implementation detail of the parsing of the lines, it has no other use for a GuardSleep instance.

Instead of string concatenation like "foo " + str(bar), the recommended way is f-strings, f"foo {bar}".

The input would have allowed some simplifications. For example, alphabetic sorting of the lines gives the same results as sorting by time. And, it seems all the "falls asleep" and "wakes up" events happen in the 0th hour. As such, you could just parse the minute instead of the entire time:

events = [Event(int(line[15:17]), get_guard(line), event_type(line)) for line in sorted(read_file(4))]

Reinderien Reinderien 70.9k5 gold badges76 silver badges256 bronze badges · Answer 1 · 2019-01-04 15:40:29Z

Replacing chained `if` with dictionary lookup

This function:

def event_type(line):
 if "begins shift" in line:
 return EventType.STARTS_SHIFT
 if "falls asleep" in line:
 return EventType.FALLS_ASLEEP
 if "wakes up" in line:
 return EventType.WAKES_UP
 raise Exception("Unknown line: " + line)

isn't bad, but chained if like this smell. It may be better represented as a dictionary, where the key is the string above, the value is the enum value, and you do a simple key lookup based on the last two words of every line. Whereas chained if is worst-case O(n), dictionary lookup is O(1). Then - no ifs needed, and you get the exception for free if key lookup fails.

Use raw strings

re.search("Guard #(\\d+)", line)

should be

re.search(r"Guard #(\d+)", line)

Settle down with the one-liners

This:

events = sorted(list(Event(datetime.strptime(line[1:17], "%Y-%m-%d %H:%M"), get_guard(line), event_type(line)) for line in read_file(4)), key=operator.attrgetter("time"))

is effectively illegible. Break this up into multiple lines - including a temporary variable for the strptime, as well as linebreaks in the list comprehension itself.

Don't use lists if you can use tuples

This:

MostSleepy = namedtuple('MostCommon', ['id', 'minute', 'amount'])

should be

MostSleepy = namedtuple('MostCommon', ('id', 'minute', 'amount'))

for various reasons - tuples are immutable, so use them for immutable data; and under certain narrow contexts (certainly not this one) they're faster.

Use a sum instead of a list constructor

This:

amount = len(list((m for m in current_guard.sleeps if m == most_common_minute)))

should be

amount = sum(1 for m in current_guard.sleeps if m == most_common_minute)

(Also, even if you kept using len, you should use a tuple constructor instead of a list constructor.)

Another footnote - don't put inner parens in expressions like list((...generator...)). Constructors can accept generator expressions directly.

I'm not sure I follow your logic on the tuple vs list in the call to namedtuple. The second argument (field names) can be a list, tuple, string, etc. I think it's largely irrelevant which you choose. Also while technically I suppose it's true that worst-case chained if's is O(n), so is dictionary lookup if you have lots of collisions. I agree that in some instances dictionary lookup can stand-in for if branches, but I think that is somewhat a preference or dictated by the number of branches you have (e.g. 100 diff conditionals would be a sign of a problem).
I would however suggest that he use "elif" instead of multiple "if" statements. In this scenario, the "if" statements are mutually exclusive, so it doesn't make a difference, but the conveyed intent could be misleading in other circumstances. If you want only one branch to execute, as opposed to each time there is a match, use "elif".

janos janos 113k15 gold badges154 silver badges396 bronze badges · Answer 2 · 2019-01-10 20:45:49Z

You could eliminate some of the many dependencies:

Since you're already using @dataclass, you could use it for MostSleepy instead of namedtuple
It looks strange to import statistics after from statistics import mode. Aside from mode, the only other thing used from it is StatisticsError. So you could use from statistics import mode, StatisticsError and not import the entire statistics
I don't see the enum doing anything useful. You could remove it and the program will still work.
The operator is not very useful either. You could replace operator.attrgetter("time") with lambda t: t.time

The add_sleeps function could be written more compactly:

def add_sleeps(self, minute):
 self.sleeps.extend(list(range(self.last_sleep, minute)))

When creating the events list, you used helper functions get_guard and event_type. It would have been good to do the same for the time too.

The last_sleep attribute doesn't belong in GuardSleep. It's an implementation detail of the parsing of the lines, it has no other use for a GuardSleep instance.

Instead of string concatenation like "foo " + str(bar), the recommended way is f-strings, f"foo {bar}".

The input would have allowed some simplifications. For example, alphabetic sorting of the lines gives the same results as sorting by time. And, it seems all the "falls asleep" and "wakes up" events happen in the 0th hour. As such, you could just parse the minute instead of the entire time:

events = [Event(int(line[15:17]), get_guard(line), event_type(line)) for line in sorted(read_file(4))]

Stack Exchange Network

Advent of Code 2018 Day 4 - Pythonic Sleepy Guards

Introduction

Problem Description

Concerns

Code

2 Answers 2

Replacing chained `if` with dictionary lookup

Use raw strings

Settle down with the one-liners

Don't use lists if you can use tuples

Use a sum instead of a list constructor

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Advent of Code 2018 Day 4 - Pythonic Sleepy Guards

Introduction

Problem Description

Concerns

Code

2 Answers 2

Replacing chained if with dictionary lookup

Use raw strings

Settle down with the one-liners

Don't use lists if you can use tuples

Use a sum instead of a list constructor

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Replacing chained `if` with dictionary lookup