Description:
The task involves processing a set of workflows, each comprising rules. These rules define conditions based on part ratings and dictate the destination workflow if the conditions are satisfied. The objective is to compute the total rating of the parts that are eventually accepted after traversing the workflows.
The workflows operate on parts rated in four categories: x, m, a, and s. Parts initiate in the initial workflow named "in" and progress through the workflows based on rule conditions. The first rule matching a part's condition is immediately applied, directing the part to the specified destination. If a part is accepted (sent to A) or rejected (sent to R), the processing ceases for that particular part.
The provided input includes a list of workflows, succeeded by a blank line, and subsequently, the ratings of the parts. The task is to determine the sum of the ratings for all parts that ultimately receive acceptance.
For example:
px{a<2006:qkq,m>2090:A,rfg}
pv{a>1716:R,A}
lnx{m>1548:A,A}
rfg{s<537:gd,x>2440:R,A}
qs{s>3448:A,lnx}
qkq{x<1416:A,crn}
crn{x>2662:A,R}
in{s<1351:px,qqz}
qqz{s>2770:qs,m<1801:hdj,R}
gd{a>3333:R,R}
hdj{m>838:A,pv}
{x=787,m=2655,a=1222,s=2876}
{x=1679,m=44,a=2067,s=496}
{x=2036,m=264,a=79,s=2244}
{x=2461,m=1339,a=466,s=291}
{x=2127,m=1623,a=2188,s=1013}
The workflows are listed first, followed by a blank line, then the ratings of the parts the Elves would like you to sort. All parts begin in the workflow named in. In this example, the five listed parts go through the following workflows:
{x=787,m=2655,a=1222,s=2876}: in -> qqz -> qs -> lnx -> A
{x=1679,m=44,a=2067,s=496}: in -> px -> rfg -> gd -> R
{x=2036,m=264,a=79,s=2244}: in -> qqz -> hdj -> pv -> A
{x=2461,m=1339,a=466,s=291}: in -> px -> qkq -> crn -> R
{x=2127,m=1623,a=2188,s=1013}: in -> px -> rfg -> A
Ultimately, three parts are accepted. Adding up the x, m, a, and s rating for each of the accepted parts gives 7540 for the part with x=787, 4623 for the part with x=2036, and 6951 for the part with x=2127. Adding all of the ratings for all of the accepted parts gives the sum total of 19114.
#!/usr/bin/env python3
from pathlib import Path
from typing import Callable, Iterable
import typer
def parse_condition(condition: str) -> dict[str, str | int] | None:
for op in {"<", ">"}:
if op in condition:
attribute, threshold = condition.split(op)
return {"attribute": attribute, "operator": op, "threshold": int(threshold)}
return None
def parse_rule(rule: str) -> dict[str, dict[str, str | int] | str | None]:
rule_parts = rule.split(":")
condition, destination = rule_parts if len(rule_parts) == 2 else (None, rule)
return {
"condition": parse_condition(condition) if condition else None,
"destination": destination,
}
def parse_workflow(
workflow: str,
) -> dict[str, list[dict[str, dict[str, str | int] | str | None]]]:
name, rules = workflow.split("{")
return {name: [parse_rule(rule) for rule in rules.strip("}").split(",")]}
def parse_rating(rating: str) -> dict[str, int]:
ratings = rating.strip("{}").split(",")
return {label: int(value) for rat in ratings for label, value in [rat.split("=")]}
def find_rule(
name: str, rules: list[dict[str, dict[str, dict[str, str | int] | str]]]
) -> dict[str, dict[str, dict[str, str | int] | str]]:
return next((rule for rule in rules if name in rule), None)
def evaluate_condition(condition: str, rating: dict[str, int]) -> bool:
op, attribute, threshold = (
condition["operator"],
condition["attribute"],
condition["threshold"],
)
return (op == "<" and rating[attribute] < threshold) or (
op == ">" and rating[attribute] > threshold
)
def parse_parts(
workflows: list[dict[str, dict[str, dict[str, str | int] | str]]],
ratings: list[dict[str, int]],
) -> int:
rating_idx = -1
curr = "in"
total = 0
while True:
# If the rule was accepted or rejected, curr would be reset to "in",
# and we'd move on to the next rating.
rating_idx += 1 if curr == "in" else 0
if rating_idx == len(ratings):
break
rating = ratings[rating_idx]
rules = find_rule(curr, workflows)
for rule in rules[curr]:
if rule["condition"]:
if evaluate_condition(rule["condition"], rating):
final = rule["destination"]
total += sum(rating.values()) if final == "A" else 0
curr = "in" if final in {"A", "R"} else final
break
else:
final = rule["destination"]
total += sum(rating.values()) if final == "A" else 0
curr = "in" if final in {"A", "R"} else final
break
return total
def read_items(lines: Iterable[str], parse_func):
items = []
line = next(lines).strip()
while line:
items.append(parse_func(line))
try:
line = next(lines).strip()
except StopIteration:
break # No lines to process.
return items
def total_ratings(lines: Iterable[str]) -> int:
return parse_parts(
read_items(lines, parse_workflow), read_items(lines, parse_rating)
)
def main(avalanche_document: Path) -> None:
with open(avalanche_document) as f:
print(total_ratings(f))
if __name__ == "__main__":
typer.run(main)
Review Request:
General coding comments, style, etc.
How can I simplify parse_parts()
?
Was there a better way to store the input?
Did I go overboard with the type hints?
2 Answers 2
The current code has many positive qualities. It is well organized. It decomposes the problem into smaller parts. The variables and functions have clear names. And so forth.
The paragraph parsing is too complex. You break the input into its two
sections in read_items()
, which uses a while loop, relies on a try-except
structure, and has to build up a list of items until we hit a blank line. It's
not a bad approach, but more direct tactics are available. A classic parsing
idea is to break input text into "paragraphs" consisting of contiguous blocks
of blank or non-blank lines. So we can start by stripping all lines. Then we
just need to group the lines, which itertools.groupby()
will do for us. We
care about only the non-blank lines (those with True
key).
from itertools import groupby
def paragraphs(lines):
lines = [line.strip() for line in lines]
return [
list(g)
for k, g in groupby(lines, key = bool)
if k
]
The solution code is too complex. You asked about simplifying
parse_parts()
, and those instincts are correct. You are doing too much in
this function: managing multiple state variables in a while loop, and looping
over rules within that structure. The code is not too bad, but is it neither
simple nor expressive: the reader has to study it with some care to keep track
of what is going on and to feel reasonably confident that it's correct.
Write from the top down with code you wish you had. When facing a first
draft of code with issues like those in parse_parts()
, one strategy is to start
writing an entirely new solution from the top down. Imagine we had a Plan
consisting of a list of workflows. Our top-level solution code might look like
the following sketch. It has almost no algorithm or logic. It just
converts one type of data directly into the next type we need. And it operates
with entities and concepts directly mappable to the question statement.
def total_ratings(lines):
# Lines to non-blank paragraphs.
ps = paragraphs(lines)
# Create a Plan to hold the workflows.
plan = Plan.from_lines(ps[0])
# Parse rating lines into dicts.
ratings = [parse_rating(line) for line in ps[1]]
# For ratings accepted by the plan, return sum of the rating values.
return sum(
sum(r.values())
for r in ratings
if plan.is_accepted(r)
)
What Plan needs. It needs to be able to take lines and return a Plan
instance. And it needs to be able to take a rating and return a bool
. As we
did above, we keep writing code we wish we had: in this case, we wish a
Workflow
instance could take a rating dict and tell us the
resulting destination.
from dataclasses import dataclass
@dataclass
class Plan:
workflows: list['Workflow']
def __post_init__(self):
# Build a dict mapping names to workflows.
self.lookup = {...}
@classmethod
def from_lines(cls, lines):
return cls(Workflow.from_line(line) for line in lines)
def is_accepted(self, rating):
curr = 'in'
while True:
wf = self.lookup[curr]
dest = wf.get_destination(rating)
if dest == 'A':
return True
elif dest == 'R':
return False
else:
curr = dest
Keep going: the Workflow. A workflow has a name and some Rule
instances.
As above, we implement a little bit of parsing logic here and a little
bit of solution logic -- but not too much, because we freely delegate grubby
details farther down the food chain.
@dataclass
class Workflow:
name: str
rules: list['Rule']
@classmethod
def from_line(cls, line):
...
return cls(...)
def get_destination(self, rating):
for r in self.rules:
if (dest := r.destination_if_satisfied(rating)):
return dest
More levels: the Rule and maybe the Condition. In the wished-for code
above, we want a Rule
instance to be able to take a rating and return the
rule's destination if the rating satisfies the rule. And we could even go
farther by envisioning a Condition
class that would also contain a tiny bit
of parsing code and a little bit of solution logic. Perhaps that final level is
not helpful enough to justify the overhead. Either way, we have confined the
needs to a very small domain, so we know the code in Rule
and below will be
quite simple.
@dataclass
class Rule:
condition: 'Condition'
destination: str
@classmethod
def from_text(cls, text):
...
return cls(...)
def destination_if_satisfied(self, rating):
c = self.condition
if c and not c.is_satisfied(rating):
return None
else:
return self.destination
Notice what is missing: type annotation complexity. Because we broke the information down into meaningful units (plan, workflow, rule, etc), none of the methods sketched above would require heroic efforts to add type annotations.
Let me add to the FMc's answer and address your specific questions:
How Can I Simplify Parse Parts?
Your function read_items
could be improved. You are establishing try/except
block for every iteration line = next(lines).strip()
and the expression next(lines).strip()
appears unnecessarily twice. Instead we can the try
outside the loop and use the walrus operator as follows:
def read_items(lines: Iterable[str], parse_func):
items = []
try:
while (line := next(lines).strip()):
items.append(parse_func(line))
except StopIteration:
pass # No lines to process.
return items
But I, too, believe that for greater clarify you should have distinct functions for parsing workflows and ratings. For workflows we iterate until we find an empty line. We should never get a StopException
and so we do not need to use a try/except
block for parsing workflows. But ultimately you could achieve simpler code if you chose a different representation for your workflows. It appears that you currently have a list of dictionaries. Each dictionary has a single key whose value is the name of the workflow and its values is a list of rules that apply for that workflow. Then given the name of a workflow you have to do a linear search of this list to find a dictionary that has a key matching the workflow name. I would instead create a single dictionary, for example named 'workflows', with multiple keys, one for each named workflow. So workflows['px']
is a list of the rules that applies to the workflow named 'px'.
But how should each rule be represented? There are many possibilities. Let's look at a sample workflow specification:
rfg{s<537:gd,x>2440:R,A}
We have 3 rules for workflow 'rfg' and the final rule is always a default rule that always fires. When a rule fires the result is either the name of another rule or 'R' (reject the current part) or 'A' (accept the current part). I have chosen to represent a rule as a tuple of 5 elements (or you can use a collections.namedtuple
instance if you want to reference each element by a name rather than by an index). For a default rule that always fires the first 4 elements of the tuple will always be None
and the 5th element will be the default "action". For example, (None, None, None, None, 'A')
represents the final, default rule above. Conversely, for a non-default rule where a comparison is required, the 5th element will always be None
. For example, ('s', '<', 537, 'gd', None)
represents the first rule above. Then by testing the 5th element for None
we know whether we are dealing with a default rule or not.
You can use whatever technique you want for creating your dictionary of workflows where the key is a workflow name and its value is a list of tuples (or named tuples that makes it clearer to the reader what each element of a rule represents, as I have used my my code below) representing the rules. In the next section below I will be using regular expressions to parse the workflows, but if you are not interested in this approach just go down to the section labeled The Code.
Using Regular Expressions to Create the workflows
Dictionary
I will use a regular expression to first break the workflow into its name and rules:
([a-z]+){([^}]+)}
([a-z]+)
- Capture group 1 will be one or more lower-case letters.{
- Followed by '{'.([^}]+)
- Followed by capture group 2 which are one or more characters that do not match '}'. That is, to match all the rules we scan until we find a '}' character.}
- Matches a '}' character.
When we match the above regular expression against input 'rfg{s<537:gd,x>2440:R,A}' capture group 1 (the workflow's name) will be 'rfg' and capture group 2 will be 's<537:gd,x>2440:R,A', a comma-separated list of rules for this workflow.
Then once we have the string 's<537:gd,x>2440:R,A', i.e. the rules, we want to match successive rules with the following regular expression:
([xmas])([<>])([0-9]+):([a-z]+|R|A)|([a-z]+|R|A)
([xmas])
- Matches either 'x', 'm', 'a', 's' as capture group 1. This is the rule's rating category.([<>])
- Matches either '<' or '>' as capture group 2.([0-9]+)
- Matches one or more digits as capture group 3.:
- Matches ':'.([a-z]+|R|A)
- Capture group 4 will be the action taken if we have a rule math. This can be either a string of lower-case letters specifying the name of the next workflow to be used or 'R' signifying that we are to reject this part orA
signifying that we are to accept this part.|([a-z]+|R|A))
- Capture group 5 signifies that we have an alternate possibility for what matches a rule, namely just a lower-case string (the next workflow name) or 'R' (reject this part) or 'A' (accept this part). This represents the final "default" rule that always fires if none of the previous rules match.
When we iteratively find successive matches using the above regular expression against 's<537:gd,x>2440:R,A', we will match 3 rules. For each rule we get a tuple of 5 elements, one for each capture group:
('s', '<', '537', 'gd', None)
('x', '>', '2440', 'R', None)
(None, None, None, None, 'A')
As previously stated, when we are matching the final rule representing the default rule, all elements of the tuple are None
except for the last (index 4). Conversely, the last element will be None
for all non -default rules). So it becomes quite easy to test whether the current rule under consideration is a default rule.
If you do not want to have to remember what each element of a rule's tuple represents we can use a collections.namedtuple
as follows:
from collections import namedtuple
...
WorkflowRule = namedtuple('WorkflowRule', [
'rating_name',
'rating_operator',
'rating_value',
'new_workflow_name',
'default_workflow_name'
]
)
Then all workflows can be read an parsed as follows:
import re
...
workflows = {}
workflow_rex1 = re.compile(r'([a-z]+){([^}]+)}')
workflow_rex2 = re.compile(r'([xmas])([<>])([0-9]+):([a-z]+|R|A)|([a-z]+|R|A)')
def read_workflows(lines: Iterable[str]) -> None:
while (line := next(lines)) != '':
m = workflow_rex1.search(line)
workflow_name = m[1] # First capture group
workflow_rules = []
for m in workflow_rex2.finditer(m[2]): # Iterate the rules (second capture group)
groups = list(m.groups()) # Convert the rules tuple to a list
if groups[2] is not None: # We have a string representing an integer
groups[2] = int(groups[2]) # Convert to an integer
# Create a named tuple from the rules and append it to our list of rules:
workflow_rules.append(WorkflowRule(*groups))
workflows[workflow_name] = workflow_rules
ratings
can remain a list of dictionaries just as you currently have it. Then part 1 is solved as follows:
The Code
I have chosen to use collections.namedtuple
instances to represent a workflow rule. If you skipped the previous section, here is the named tuple again:
from collections import namedtuple
...
WorkflowRule = namedtuple('WorkflowRule', [
'rating_name',
'rating_operator',
'rating_value',
'new_workflow_name',
'default_workflow_name'
]
Once we have parsed the workflows and ratings, then Part 1 is solved with:
def is_accepted(rating: dict[str, int]) -> bool:
"""Do we accept this rating?"""
workflow_name = 'in' # Initial workflow name to process
while True:
workflow_rules = workflows[workflow_name] # List of rules for this workflow
for workflow_rule in workflow_rules: # Process each rule
if workflow_rule.default_workflow_name: # Default rule if not None
workflow_name = workflow_rule.default_workflow_name # The next action
result = True # default rules always fire
else:
# A non-default rule requiring a comparison:
workflow_name = workflow_rule.new_workflow_name # The next action if this rule fires
value1 = rating[workflow_rule.rating_name]
value2 = workflow_rule.rating_value
result = value1 > value2 if workflow_rule.rating_operator == '>' else value1 < value2
if result: # If the rule fires
if workflow_name == 'R':
return False
if workflow_name == 'A':
return True
break # Else process workflow name
# Else process new rule
def part1() -> int:
"""Solves Part 2 of Day 19 Advent of Code.
See: https://adventofcode.com/2023/day/19"""
# We assume the `ratings` and `workflows` variables have been
# created and the code for that is omitted here.
total = 0
for rating in ratings:
if is_accepted(rating):
total += sum(rating.values())
return total
Was There a Better Way to Store the Input?
Answered in the previous section.
Did I Go Overboard with the Type Hints?
No. But with the above suggestions they should be less complicated.
-
\$\begingroup\$ The walrus operator is new to me, I was missing C's assignments inside the loop conditions. I can see why they'd introduce a new operator specifically for this purpose. Thank you. \$\endgroup\$Madagascar– Madagascar2024年01月06日 14:45:30 +00:00Commented Jan 6, 2024 at 14:45
-
1\$\begingroup\$ If you are not already familiar with regular expressions, they are a worthwhile addition to your "toolbox". And if you are interested, here is the complete code for my implementation (the input is coming from
stdin
). \$\endgroup\$Booboo– Booboo2024年01月06日 15:14:07 +00:00Commented Jan 6, 2024 at 15:14 -
1
-
\$\begingroup\$ You could name your capture groups to avoid having to access them with an index that has no real meaning. The first regex would look like
(?P<workflow_name>[a-z]+){(?P<rules>[^}]+)}
, and theworkflow_name
group cloud be accessed withm.group("workflow_name")
. Also,read_workflows
expects anIterator
, not anIterable
. Great answer otherwise! \$\endgroup\$rdesparbes– rdesparbes2024年01月07日 11:47:41 +00:00Commented Jan 7, 2024 at 11:47 -
1\$\begingroup\$ @Harith Actually
\w
also captures numbers and\d
can capture more than just the digits 0 .. 9 unless ASCII mode is set as an option. (withre.ASCII
) If you assume the input is "correct", then you could use(\w)=(\d+)
. But if you want to verify the correctness of the input then use([xmas])=([0-9]+)
. By the way,[a-z]{1}
is equivalent to just[a-z]
. \$\endgroup\$Booboo– Booboo2024年01月10日 12:18:35 +00:00Commented Jan 10, 2024 at 12:18
Explore related questions
See similar questions with these tags.