I'm a beginner trying to make a Wordle clone in python. I already have most of it sorted out but am concerned about my code being difficult to read. The part I'm struggling with the most is the algorithm for how guesses are displayed.
An easily overlooked aspect of the original game is that it tries to make the player's life easier by hiding additional instances of a character in a guess whenever all instances of that letter have been found. That is, if the solution was "walls", and the player inputs "lulls", the last three characters would show as green as expected, but the first 'L' would be greyed out to indicate that all instances of that letter have been found (only for that line, subsequent lines/guesses reset this).
This may seem obvious if you've played the game a lot, but many clones don't do this, which has the side effect of making the game harder. This answer to another wordle related question touches on this.
I'm trying to implement this functionality in the function that compares player input with the solution in my version, the relevant section of which currently looks like this:
from collections import Counter, defaultdict
green = 3
yellow = 2
grey = 1
def compare_word(string, solution):
def clear_yellow_letters():
# use counter to count instaces
# of a letter in each string
str_c_count = Counter(string)
sol_c_count = Counter(solution)
for k, v in c_green_count.items():
if v == sol_c_count[k]:
# if there are more instances of letter than
# have been located in solution, grey-out
# additional instances
if str_c_count[k] > sol_c_count[k]:
for i in range(len(color_dic[0])):
char = color_dic[0][i]
if char == k and color_dic[1][i] == yellow:
color_dic[1][i] = grey
return color_dic
# color_dic is actually a list >_>. since the same letter can appear
# multiple times in a word, we prefer to use the index as a key
# instead of a real dictionary, where repeated letters would all
# point to the same color regardless of location.
color_dic = string, [0 for c in string]
c_green_count = defaultdict(int) # creates keys if they don't exist
# this loop compares the chars in string and solution based on index,
# and assigns a color to the respective position in the array.
for i in range(len(color_dic[0])):
char = color_dic[0][i]
if char == solution[i]:
color = green
c_green_count[char] += 1
elif char in solution:
color = yellow
else:
color = grey
color_dic[1][i] = color
if c_green_count.items():
color_dic = clear_yellow_letters()
return color_dic
print(compare_word('lulls', 'walls'))
This prints a tuple containing a string, and a list of ints referencing each color used for displaying guesses. Note that this use of print is just for demonstration, the actual output is handled in curses (that's what the numbers are for).
Here clear_yellow_letters()
does what I just described. Without it compare_word()
would return the int that maps to l
in our string as 2 (yellow). While this works as it is, I feel there must be a simpler way to do it, as that section is a bit difficult to follow, even with all the comments.
Another thing I'm confused about is code style/formatting. This is only an excerpt of my full project, which I've split into several modules due to its length, but one issue I keep coming across is how to know when to break-down functions into smaller functions and when to not. With compare_word()
for example, clear_yellow_letters()
is only called inside of it. Should I give the latter any parameters if all the vars it uses exist in the same scope, or is it okay to use the same names in this case? Would it be best to define it as a separate function for legibility?
I hope this excerpt is sufficient for understanding the logic for how I'm trying to get the words displayed, I am not including the whole thing because as I mentioned, it's rather long, but if more context is needed I'll oblige. I'd appreciate any suggestions on what could be improved.
2 Answers 2
First of all, I think it's great that you ask for feedback on a well isolated and explained piece of code. I also like your attention to detail in the rules of the game.
I also find this code complex, I think for two main reasons:
- The algorithm itself is a bit complex
- The implementation of the algorithm is a bit complex
Simplifying the algorithm
The current algorithm goes something like this:
- In a 1st pass, check the letters pairwise:
- if the pair is a match:
- mark the position green
- update the count of matches of this letter
- or else if the guess exists somewhere else in the solution:
- mark the position yellow
- or else mark the position grey
- if the pair is a match:
- If there were any matches, make a 2nd pass to clear some yellows
- for each matched letter and its count
- if all instances of this letter were matched
- if the guess contains more instances of this letter
- find all unmatched positions of this letter and mark them grey
- if the guess contains more instances of this letter
- if all instances of this letter were matched
- for each matched letter and its count
Consider a simpler alternative:
- Initialize all positions to grey
- Compute a count of all letters in the solution
- In a 1st pass, check the letters pairwise:
- if the pair is a match:
- mark the position green
- update the count of matches of this letter
- if the pair is a match:
- In a 2st pass, check the letters pairwise:
- if the pair is not a match, and the solution contains unmatched positions of this letter
- mark the position yellow
- if the pair is not a match, and the solution contains unmatched positions of this letter
That's fewer steps, and also simpler to implement: (Using the Status enum idea from the answer of @Reinderien)
class Status(Enum):
MATCH = 3
MISPLACE = 2
MISMATCH = 1
status = [Status.MISMATCH] * len(guess)
counts_in_solution = Counter(solution)
matches = defaultdict(int)
for index, (g, s) in enumerate(zip(guess, solution)):
if g == s:
status[index] = Status.MATCH
matches[g] += 1
for index, (g, s) in enumerate(zip(guess, solution)):
if g != s and matches[g] < counts_in_solution[g]:
status[index] = Status.MISPLACE
return [s.value for s in status]
Simplifying the implementation
Even if we don't change the main algorithm, the implementation of the original algorithm can be written simpler.
First of all, some of the variable names don't help understanding the logic.
string
would describe itself better asguess
- Note that
string
was also a poor name because it shadows a common Python package
- Note that
- the color codes would be better in all-caps
c_green_count
would be less cryptic asgreen_counts
ormatches
Then there is color_dic
...
color_dic = string, [0 for c in string]
In a comment you had the right instinct: "color_dic is actually a list". Even further:
color_dic
is a tuple, and the comment was probably referring to the 2nd element onlycolor_dic
was not useful at all:- the first value is just the guess, and there is no need to store it in another tuple
- the second value is probably just what you intended for
color_dic
. A better name would becolors
.
Even with just these renames, and the minor transformation of dropping color_dic
,
I think the code already looks a bit less complex:
matches = defaultdict(int)
colors = [0 for c in guess]
for i in range(len(guess)):
char = guess[i]
if char == solution[i]:
color = GREEN
matches[char] += 1
elif char in solution:
color = YELLOW
else:
color = GREY
colors[i] = color
if matches.items():
clear_yellow_letters()
return guess, colors
Next, let's apply some idiomatic transformations:
colors = [GREY] * len(guess)
for index, [g, s] in enumerate(zip(guess, solution)):
if g == s:
colors[index] = GREEN
matches[g] += 1
elif g in solution:
colors[index] = YELLOW
if matches:
clear_yellow_letters()
That is:
- To create a list of the same value, you can use
[the_value] * count
- It's good to initialize colors to all grey because:
- It's a valid value, unlike the original 0
- It's a sensible default
- This way we don't need to overwrite values to grey later
- Instead iterating over indexes,
enumerate
lets us iterate over (index, value) pairs - Instead using index to get pairs of characters from guess and solution, we can use
zip
to line them up - Instead of
if matches.items()
it's equivalent and shorter to writeif matches
For the 2nd pass, again I would start with renaming variables that look cryptic and hard to understand, to names that seem more natural and descriptive:
str_c_count
->counts_in_guess
sol_c_count
->counts_in_solution
k, v
->letter, count
With the above renames, and other minor transformations we get:
def clear_yellow_letters():
counts_in_guess = Counter(guess)
counts_in_solution = Counter(solution)
for letter, count in matches.items():
if count == counts_in_solution[letter] and counts_in_guess[letter] > count:
for index, g in enumerate(guess):
if g == letter and colors[index] == YELLOW:
colors[index] = GREY
Where the other transformations are:
- Joined the two
if
s in the outer loop: this is equivalent, and looks simpler - Replaced the second reference to
counts_in_solution[letter]
withcount
, because at that point we know they are the same - Using
enumerate
to loop over index + value pairs - Not returning
colors
, since we can manipulate the variable in the context
Having a for
loop nested within another for
loop still looks complex of course.
If you think about it,
the inner loop is clumsy,
because what we don't really want to iterate over all the letters in guess,
what we really want is operate on the matched letters only.
In other words, currently we're iterating over dictionary entries and then doing a search in a list. If we reverse these operations we would iterate over a list and then do a search in a dictionary, which is easier and faster too:
def clear_yellow_letters():
counts_in_solution = Counter(solution)
for index, (g, s) in enumerate(zip(guess, solution)):
if g != s and matches[g] == counts_in_solution[g]:
colors[index] = GREY
Lastly, since this function doesn't really buy us much, we as well inline it, arriving at something that looks simpler, with simple equivalent transformations, and a very small change in the algorithm:
def compare_word(guess, solution):
matches = defaultdict(int)
colors = [GREY] * len(guess)
for index, (g, s) in enumerate(zip(guess, solution)):
if g == s:
colors[index] = GREEN
matches[g] += 1
elif g in solution:
colors[index] = YELLOW
if matches:
counts_in_solution = Counter(solution)
for index, (g, s) in enumerate(zip(guess, solution)):
if g != s and matches[g] == counts_in_solution[g]:
colors[index] = GREY
return guess, colors
Defining functions inside functions
Defining functions inside functions is reasonable when:
- The function is not useful outside
- The function is called more than once
In the posted code you only call the inner function once, so I think it doesn't add value.
And when defining inner functions, beware of accidentally modifying variables that our defined in their outer scope. Code editors like PyCharm will give a warning for variables in inner functions whose names shadow variables in the outer scope, which can be confusing for readers, and a source of mistakes.
-
\$\begingroup\$ Greatly appreciate the answer :). I have one additional question, is there any advantage to declaring Status as an Enum if I plan on using its values as ints anyway? Are they preferable solely because they're constants? \$\endgroup\$Clara– Clara2022年08月26日 03:29:14 +00:00Commented Aug 26, 2022 at 3:29
-
1\$\begingroup\$ Enum is the right type when you need to represent a fixed set of constants. Here, status can only have 3 possible values and nothing else. Using an enum correctly expresses this intent. Even if you will use the int value right after the function returns, using the enum within the function helps you express your intent with precision that's not possible with simple int values. When reading code, understanding the intent helps to understand the code. \$\endgroup\$janos– janos2022年08月26日 06:16:45 +00:00Commented Aug 26, 2022 at 6:16
You should reframe your colour constants as an enum. Also, since this is all purely business logic and not presentation, the first-class names should be match status descriptions with the colours being secondary.
I don't see a lot of value in having your inner clear_yellow_letters
as a nested function.
I think you've done the right thing in having two counters, one for each of string
and solution
. However, the algorithm that uses them can be simplified by deriving a dictionary of minimum counts as I will demonstrate.
color_dic
is a troubled name for a number of reasons, including that it isn't a dictionary and it shouldn't contain colours per se.
I do not think that a defaultdict
is necessary.
Suggested
I believe this to be equivalent but you will want to test.
from collections import Counter
from enum import Enum
class Status(Enum):
MATCH = 3
MISPLACE = 2
MISMATCH = 1
GREEN = MATCH
YELLOW = MISPLACE
GREY = MISMATCH
def compare_word(provided: str, solution: str):
matches = [
Status.MATCH if p == s else Status.MISMATCH
for p, s in zip(provided, solution)
]
prov_remain = Counter(
p for p, status in zip(provided, matches)
if status is not Status.MATCH
)
soln_remain = Counter(
s for s, status in zip(solution, matches)
if status is not Status.MATCH
)
reassignable = {
p: min(n_p, soln_remain[p])
for p, n_p in prov_remain.items()
}
for i, p in enumerate(provided):
if matches[i] is not Status.MATCH and reassignable.get(p):
matches[i] = Status.MISPLACE
reassignable[p] -= 1
return matches
print(compare_word('lulls', 'walls'))
print(compare_word('abedc', 'abcde'))
print(compare_word('bbccc', 'aabdd'))