Bracket matching - Advent of Code 2021 Day 10

Question 1

The following code solves Advent Of Code 2021 Day 10.

The goal for this time around is evaluating whether a line of brackets is complete, has a syntax error or is missing some closing brackets.

Examples of valid pairs of brackets

()
[] 
([])
{()()()}
<([{}])>
[<>({}){}[([])<>]]
(((((((((())))))))))

Example of incorrect or corrupted lines

{([(<{}[<>[]}>{[]{[(<()> # Expected ], but found } instead.
[[<[([]))<([[{}[[()]]] # Expected ], but found ) instead.
[{[{({}]{}}([{[{{{}}([] # Expected ), but found ] instead.
[<(<(<(<{}))><([]([]() # Expected >, but found ) instead.
<{([([[(<>()){}]>(<<{{ # Expected ], but found > instead.

Examples of incomplete lines

[({(<(())[]>[[{[]{<()<>> # Complete by adding }}]])})].
[(()[<>])]({[<{<<[]>>( # Complete by adding )}>]}).
(((({<>}<{<{<>}{[]{[]{} # Complete by adding }}>}>)))).
{<[[]]>}<{[{[{[]{()[[[] # Complete by adding ]]}}]}]}>.
<{([{{}}[<[[[<>{}]]]>[]] # Complete by adding ])}>.

Problem:

Advent of code 10 was divided into two problems:

Find the sum of all lines that has a syntax error (corrupted), where every bracket that causes a syntax error is given a particular value.
For each line that has a syntax error compute the string of closing characters that when appended to the line will make the line syntactically correct (i.e. "complete"). We then compute for this closing string an autocomplete score as follows: We set the score initially to 0. Then for each character of the closing string, multiply the score by 5 and then add to it a point value associated with the closing character. Finally find the median of all the autocomplete scores where the number of such scores is guaranteed to be odd.

Code

from typing import Final
# Points for each illegal character
points: Final[dict[str, int]] = {
 ')': 3,
 ']': 57,
 '}': 1197,
 '>': 25137
}
# Bracket pairs
pairs: Final[dict[str, int]] = {
 '(': ')',
 '[': ']',
 '{': '}',
 '<': '>'
}
# Values for scoring completion characters
values: Final[dict[str, int]] = {
 ')': 1,
 ']': 2,
 '}': 3,
 '>': 4
}
def syntax_error_score_and_autocomplete_score(lines):
 scores = []
 total_score = 0
 for line in lines:
 stack = []
 corrupted = False
 corrupted_char = None
 for char in line.strip():
 if char in pairs:
 # Opening bracket
 stack.append(char)
 else:
 # Closing bracket
 if not stack:
 # Extra closing bracket, corrupted
 corrupted = True
 corrupted_char = char
 break
 top = stack.pop()
 if pairs[top] != char:
 # Mismatched closing bracket, corrupted
 corrupted = True
 corrupted_char = char
 break
 # Ignore corrupted lines
 # If the line is corrupted, add the corresponding points
 if corrupted:
 total_score += points[corrupted_char]
 continue
 # If stack is not empty, we need to complete the line
 if stack:
 completion_string = ""
 while stack:
 opening = stack.pop()
 completion_string += pairs[opening]
 # Calculate score for this completion string
 score = 0
 for ch in completion_string:
 score = score * 5 + values[ch]
 scores.append(score)
 print("part1: (syntax_error_score)", total_score)
 scores.sort()
 # Get the middle score
 print("part2: (autocomplete_score)", scores[len(scores)//2])
if __name__ == "__main__":
 with open("D:\\ExplorerDownload\\input2021_day10.txt", "r") as f:
 lines = f.readlines()
 syntax_error_score_and_autocomplete_score(lines)

Question 2

Looks good. It is easy to read, with clear identifiers.

extra boolean

This is some nice, idiomatic code.

 corrupted_char = None

Given the type of str | None, we don't need a bool corrupted flag alongside it. Testing if the char is None does the same job as seeing if the flag is False. Fewer state variables means fewer things to keep juggling in your head. On which topic...

extract helper

We strive to write short functions which can be viewed in a single screenful, without vertical scrolling.

That for char ... loop performs a nice localized computation and is a good candidate for packaging up in a helper function. Pass in a line and a stack, side effect the stack, and get back a corrupted_char.

odd annotation

We don't tend to often see Final in annotated production code.

pairs: Final[dict[str, int]] = {

Quoting from its documentation:

A final name cannot be re-assigned or overridden in a subclass. ...

There is no runtime checking of these properties.

Consider instead using from types import MappingProxyType and assign one of those. Type checkers know all about it, plus its cousins such as frozenset. Runtime mutation attempts will produce TypeError: 'mappingproxy' object does not support item assignment

In the simpler scalar case, rather than marking it Final we typically will upcase such constants, e.g. SECONDS_PER_DAY = 86400. And yes, linters will call you out if they see you trying to increment such a quantity.

And do run pyright or mypy over your code after you've annotated it. That dict[str, int] copy-pasta mapping makes no sense. You wanted dict[str, str]. Bad annotations are noticed by type checkers, but have no effect at runtime.

extra comments

This source code is perhaps a little on the chatty side, but not overly so, not so much that it worries me. Do consider deleting the occasional comment if you feel it says nothing more than what the code already has eloquently stated. Your descriptive identifiers are a big help in that regard.

This was the only comment item that gave me pause:

 if char in pairs:
 # Opening bracket
 stack.append(char)

It makes me slightly sad, because ideally an identifier like pairs should be doing the work of explaining that. Yet I couldn't think of a better name than pairs, sorry. Maybe def is_opening_bracket(char): is the best way out of this?

median

... we were then asked to find the median of the autocomplete scores.

 print("part2: (autocomplete_score)", scores[len(scores)//2])

I didn't notice anything in your problem statement which promises there shall be an odd number of scores. In the even case I would expect a median computation to find the mean of the two central figures. Notice that if there's a bunch of repeated scores this code might "get lucky", as the mean of two equal numbers is identical to the value of either one of them. But with enough test inputs your luck will eventually run out.

Question 3

I just have one suggestion to add:

Simplification

For Part 2 you have:

 # If stack is not empty, we need to complete the line
 if stack:
 completion_string = ""
 while stack:
 opening = stack.pop()
 completion_string += pairs[opening]
 # Calculate score for this completion string
 score = 0
 for ch in completion_string:
 score = score * 5 + values[ch]
 scores.append(score)

This needlessly builds up a string which is then iterated. Simpler would be:

 # If stack is not empty, we need to complete the line
 if stack:
 score = 0
 while stack:
 opening_char = stack.pop()
 score = score * 5 + values[pairs[opening_char]]
 scores.append(score)

Question 4

Other answers say the code is easy to read, and I'm in no position to contradict that, as readability is always subjective. But to me, this code is not easy to read. It's just one big function with very deeply nested loops and conditionals, even with breaks and continues, and I have to read and understand all of it to get a sense of what's going on. I can only understand it because I happen to know the requirements (the AoC question).

So in my opinion, the first improvement to make is to divide this code into smaller bits that make sense on their own, and extract them into their own functions or even files.

Speaking of dividing, there are two very independent concerns in this solution. There is parsing a line, and there is computing some score based on the results. They're conceptually separate issues, but in this code they're all mixed up.

Another hint that the code is not readable is that it needs so many comments. Right now the comments make sense of it, but they wouldn't be necessary if the code were self-explanatory. For example, these comments:

# Points for each illegal character
points: Final[dict[str, int]] 
...
# Values for scoring completion characters
values: Final[dict[str, int]]

Are not necessary if you just give the variables clearer names:

points_for_illegal_chars: Final[dict[str, int]] 
points_for_completion_chars: Final[dict[str, int]]

Then this comment

# Calculate score for this completion string

Is not necessary if you have a function about calculating something

And this comment

# Get the middle score
print("part2: (autocomplete_score)", scores[len(scores)//2])

Is not necessary if you define a median function

print("part2: (autocomplete_score)", median(scores))
...
def median(values):
 return sorted(values)[len(values) // 2

Here's the code I ended with after refactoring yours. The main difference is that I've tried to keep the parsing logic separate from the scoring logic. To me, that's non-negotiable as otherwise there's no way you could write tests for the parsing bit. There are some other decisions that are way more subjective, such as the use of exceptions for control flow and parsing each line twice, and whether the parsing logic shouldn't be all just kept in just one function.

# parsing.py
from typing import Final
_bracket_pairs: Final[dict[str, int]] = {
 '(': ')',
 '[': ']',
 '{': '}',
 '<': '>'
}
def parse_line(line):
 _ParseAlgorithm().run(line)
class _ParseAlgorithm:
 def run(self, line):
 self._initialize()
 for char in line.strip():
 if self._is_valid_opener(char):
 self._open_group(char)
 elif self._is_valid_closer(char):
 self._close_group()
 else:
 raise UnexpectedCharException(char)
 if self._expected_closers:
 raise IncompleteLineException(self._expected_closers)
 def _initialize(self):
 self._expected_closers = []
 def _is_valid_opener(self, char):
 return char in _bracket_pairs.keys()
 def _is_valid_closer(self, char):
 return self._expected_closers and self._expected_closers[0] == char
 def _open_group(self, char):
 self._expected_closers.insert(0, _bracket_pairs[char])
 def _close_group(self):
 self._expected_closers.pop(0)
class UnexpectedCharException(Exception):
 def __init__(self, unexpected_char):
 self.unexpected_char = unexpected_char
class IncompleteLineException(Exception):
 def __init__(self, missing_string):
 self.missing_string = missing_string

# scoring.py
from typing import Final
import parsing
_corruption_points: Final[dict[str, int]] = {
 ')': 3,
 ']': 57,
 '}': 1197,
 '>': 25137
}
_autocomplete_points: Final[dict[str, int]] = {
 ')': 1,
 ']': 2,
 '}': 3,
 '>': 4
}
def compute_corruption_score(lines):
 result = 0
 for line in lines:
 try:
 parsing.parse_line(line)
 except parsing.UnexpectedCharException as e:
 result += _corruption_points[e.unexpected_char]
 except parsing.IncompleteLineException:
 pass
 return result
def compute_autocomplete_score(lines):
 autocomplete_scores = []
 for line in lines:
 try:
 parsing.parse_line(line)
 except parsing.UnexpectedCharException:
 pass
 except parsing.IncompleteLineException as e:
 autocomplete_scores.append(_autocomplete_score_single_line(e.missing_string))
 return _median(autocomplete_scores)
def _autocomplete_score_single_line(missing_string):
 result = 0
 for ch in missing_string:
 result *= 5
 result += _autocomplete_points[ch]
 return result
def _median(values):
 return sorted(values)[len(values) // 2]

# main.py
import scoring
if __name__ == "__main__":
 with open("./input2021_day10.txt", "r") as f:
 lines = f.readlines()
 print("part1: (syntax_error_score)", scoring.compute_corruption_score(lines))
 print("part2: (autocomplete_score)", scoring.compute_autocomplete_score(lines))

Question 5

In addition to the points raised in the previous answer, here are some other items to consider.

Naming

User @J_H makes an astute point that you only need to retain one of these variables:

corrupted
corrupted_char

However,if you do decide to keep both, I recommend renaming corrupted as is_corrupted. Adding the is_ prefix is a common practice for booleans, and it would more clearly distinguish the 2 variables by name.

Comments

My first impression while reading through the code was that the function had well-placed comments that helped me understand what was going on.

Magic number

It is not obvious to me what the significance of the 5 is in the following line:

 score = score * 5 + values[ch]

Perhaps a comment would help to clarify.

Which leads us to ...

Documentation

I understand that the focus for programming challenges is functionality, and the last thing your are thinking about is documentation, but adding some simple docstrings would help to understand the purpose of the code.

Question 6

I agree that 5 is inscrutable. I take it to be "one more than max(values.values())". @138Aspen, would you please edit the question to at least include a Part Two link? And ideally add one or more sentences so the CodeReview question makes sense on its own, even if the AoC problem description page were to go 404.

Question 7

@J_H please don't ask to copy&paste AOC problem texts here, it's forbidden by AOC rules. And there's no part 2 link - part 2 opens for signed in users who solved part 1 of that day, and there's a reason for such treatment. So OP can only rephrase the question in their own words

Question 8

@STerliakov, I was only trying to minimize the OP's effort. @138Aspen, would you please take another stab at rephrasing the Part Two question in your own words? The CodeReview question as written does not offer sufficient detail for a reviewer to adequately understand the problem assignment to properly review the code. In particular the 5 is inscrutable and the AoC site is perhaps not using the word "median" in the way that most high school math teachers would use it. Maybe medoid was intended?

Question 9

I know the problem stmt, median calculation here is correct - the problem promised there will always be an odd number of malformed lines. (But yes, the question should be edited to clarify this and scoring rules)

Question 10

DRY

The repeated statements to deal with either an empty stack or a 'corrupt character' suggest an alternative approach could be pursued.

Imagine pushing a "non-bracket" character, onto the stack before addressing any of the characters of line. Imagine that character is, for example, '$', sitting there indicating "this is the end of the road."

The else branch means 1 of 4 "closing bracket" characters from line is being examined.
Both the "corrupt" case and the "empty stack" case can be determined in one conditional because the popped '$' won't match any of the 4 possible close brackets.

Note: '$' must be added as a 5th record to pairs[] so that it can be found when needed.

pairs: Final[dict[str, str]] = {
 '(': ')',
 '[': ']',
 '{': '}',
 '<': '>',
 '$': '$'
}

Result: 4 lines + 'pop()' + 4 lines ==> 'pop()' + 4 lines

When the characters of line are exhausted, and presuming there is no 'corruption', the stack will not be empty. Fine! Go ahead! Drain what remains on the stack and tabulate some score. (Need to account for '$' in this bizarre calculation because of the compounding x = x * 5 + y.) This eliminates the line if stack: toward the bottom of the loop.

Adopting the already VG suggestions made by @J_H and by @Booboo...

 stack = []
 stack.append('$') # NEW!
 corrupted_char = None
 for char in line.strip():
 if char in pairs:
 # Opening bracket
 stack.append(char)
 else:
 # Closing bracket
 top = stack.pop()
 if pairs[top] != char:
 # Mismatched closing bracket, corrupted
 corrupted_char = char
 break # end of the road, corrupt or "empty" stack
 if corrupted_char:
 total_score += points[corrupted_char]
 else: # instead of 'continue'. Avoid explicit flow control if possible!
 score = 0
 while stack:
 top = stack.pop()
 if top != '$':
 score = score * 5 + values[pairs[top]]
 if score: # Only record non-zero results...
 scores.append(score)

There's nothing in the rules preventing the coder from manipulating the obvious data requirements (adding a 5th element) as long as the program's behaviour achieves the desired goal.

Really DRY

The observant reader will notice two translation lookup instances into pair[] in that code snippet.

Instead of pushing "opens" onto the stack, this would be a time to push its required "close" mate. The popping operations are, already, tricky to think about:

 stack = [ '$' ] # Best guess in my ignorance of Python
 corrupted_ch = None
 for ch in line.strip():
 if ch in pairs: # Opening bracket
 stack.append( pairs[ch] ) # Push its mate
 else: # Closing bracket
 if stack.pop() != ch: # Mismatched! corrupted
 corrupted_ch = ch
 break # end of the road, corrupt or '$' depleted stack
 if corrupted_ch:
 total_score += points[corrupted_ch]
 else: # instead of 'continue'. Avoid explicit flow control if possible!
 score = 0
 while stack:
 top = stack.pop()
 if top != '$':
 score = score * 5 + values[top]
 if score: # Only record non-zero results...
 scores.append(score)

(I believe guidelines probably discourage too much density (dangling comments on the same line as code statements. This is presented, as-is, to illustrate replacing two lookups with only one lookup.)

The calculation of "points" no longer needs to translate "opens" to "closes" when looking-up the value of each character.

Maybe?:
When calculating "points" perhaps more could be done simply using the index of each closing bracket character in a simpler string (".)]}>") to give 1, 2, 3, or 4.

Question 11

After a good night's sleep the term sentinel rises out of the fog (with the morning's coffee.) This variation uses a "sentinel character" that removes the need to explicitly detect and deal with an empty stack (to improve focus on pairing-up brackets.)

J_H J_H 41.4k3 gold badges38 silver badges157 bronze badges · Accepted Answer · 2024-12-16 07:00:00Z

Looks good. It is easy to read, with clear identifiers.

extra boolean

This is some nice, idiomatic code.

 corrupted_char = None

Given the type of str | None, we don't need a bool corrupted flag alongside it. Testing if the char is None does the same job as seeing if the flag is False. Fewer state variables means fewer things to keep juggling in your head. On which topic...

extract helper

We strive to write short functions which can be viewed in a single screenful, without vertical scrolling.

That for char ... loop performs a nice localized computation and is a good candidate for packaging up in a helper function. Pass in a line and a stack, side effect the stack, and get back a corrupted_char.

odd annotation

We don't tend to often see Final in annotated production code.

pairs: Final[dict[str, int]] = {

Quoting from its documentation:

A final name cannot be re-assigned or overridden in a subclass. ...

There is no runtime checking of these properties.

Consider instead using from types import MappingProxyType and assign one of those. Type checkers know all about it, plus its cousins such as frozenset. Runtime mutation attempts will produce TypeError: 'mappingproxy' object does not support item assignment

In the simpler scalar case, rather than marking it Final we typically will upcase such constants, e.g. SECONDS_PER_DAY = 86400. And yes, linters will call you out if they see you trying to increment such a quantity.

And do run pyright or mypy over your code after you've annotated it. That dict[str, int] copy-pasta mapping makes no sense. You wanted dict[str, str]. Bad annotations are noticed by type checkers, but have no effect at runtime.

extra comments

This source code is perhaps a little on the chatty side, but not overly so, not so much that it worries me. Do consider deleting the occasional comment if you feel it says nothing more than what the code already has eloquently stated. Your descriptive identifiers are a big help in that regard.

This was the only comment item that gave me pause:

 if char in pairs:
 # Opening bracket
 stack.append(char)

It makes me slightly sad, because ideally an identifier like pairs should be doing the work of explaining that. Yet I couldn't think of a better name than pairs, sorry. Maybe def is_opening_bracket(char): is the best way out of this?

median

... we were then asked to find the median of the autocomplete scores.

 print("part2: (autocomplete_score)", scores[len(scores)//2])

I didn't notice anything in your problem statement which promises there shall be an odd number of scores. In the even case I would expect a median computation to find the mean of the two central figures. Notice that if there's a bunch of repeated scores this code might "get lucky", as the mean of two equal numbers is identical to the value of either one of them. But with enough test inputs your luck will eventually run out.

Stack Exchange Network

Bracket matching - Advent of Code 2021 Day 10

Problem:

Code

5 Answers 5

extra boolean

extract helper

odd annotation

extra comments

median

Simplification

Naming

Comments

Magic number

Documentation

DRY

Really DRY

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Bracket matching - Advent of Code 2021 Day 10

Problem:

Code

5 Answers 5

extra boolean

extract helper

odd annotation

extra comments

median

Simplification

Naming

Comments

Magic number

Documentation

DRY

Really DRY

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions