Solver for Wordle puzzle

Question 1

After a success with the Jumble puzzle solver, I am doing a Wordle puzzle solver. I know many other Wordle solvers have been posted, yet this is my original version, which emphasizes list comprehension.

First, let us load the data (English word with its frequency)

import csv
def csv_to_dict(file_path):
 """Reads a CSV file and converts it into a list of dictionaries.
 Args:
 file_path: The path to the CSV file.
 Returns:
 A list of dictionaries, where each dictionary represents a row in the CSV file.
 Returns an empty list if the file is not found or if an error occurs.
 """
 data = {}
 try:
 with open(file_path, 'r', newline='', encoding='utf-8') as file: # Added encoding
 reader = csv.DictReader(file)
 for row in reader:
 #print(row)
 word = row['word']
 if len(word) == 5:
 data[word] = row['count']
 except FileNotFoundError:
 print(f"Error: File '{file_path}' not found.")
 except Exception as e:
 print(f"An error occurred: {e}")
 return data
# The data is from https://www.kaggle.com/datasets/rtatman/english-word-frequency?resource=download
file_path = '/content/drive/MyDrive/data/unigram_freq.csv'
result_dict = csv_to_dict(file_path)
#print(result_dict)
fl_words = list(result_dict.keys())
#fl_words
print(len(fl_words))
print(fl_words[:10])

39933
['about', 'other', 'which', 'their', 'there', 'first', 'would', 'these', 'click', 'price']

For the first guess instead of using the suggested words such as ADIEU, AUDIO, GRACE, etc., we can choose randomly from the top 100 most frequent words with some criteria: at least 3 vowel, no redundant letter, use the most frequent letter (not yet implemented), etc.


initial_words = [word for word in fl_words[:100] if len(set(word).intersection(set('aeiouy'))) >2]
print(len(initial_words))
print (initial_words)
def is_redundant_letters(text):
 """
 Detects and returns redundant letters in a string.
 Args:
 text: The input string.
 Returns:
 True if redundant letters are detected, False otherwise.
 """
 seen = set()
 for char in text:
 if char in seen:
 return True
 else:
 seen.add(char)
 return False
initial_words = [word for word in initial_words if not is_redundant_letters(word)]
print(len(initial_words))
print (initial_words)

And here is the suggested word:

13
['about', 'email', 'video', 'years', 'today', 'house', 'media', 'guide', 'image', 'money', 'value', 'movie', 'yahoo']
12
['about', 'email', 'video', 'years', 'today', 'house', 'media', 'guide', 'image', 'money', 'value', 'movie']

Finally, here is the solver:

guess_word = 'about'
# R = GRAY - forbidden letters
# G = GREEN - right letter on right position - in_place
# Y = YELLOW - right letter wrong position - contains and not_in
guess_feedback = 'RRRYR'
in_place = []
forbidden_letters = ''
not_ins = []
for idx, (letter, feedback) in enumerate(zip(guess_word, guess_feedback)):
 #print(idx, letter, feedback)
 if feedback == 'G':
 in_place.append((letter, idx))
 if feedback == 'Y':
 not_ins.append((letter, idx))
 if feedback == 'R':
 forbidden_letters += letter
print('in_place', in_place)
print('not_ins', not_ins)
print('forbidden_letters', forbidden_letters)
# forbidden_letters is done and tested
filtered_words = [word for word in filtered_words if set(word).isdisjoint(set(forbidden_letters))]
print(len(filtered_words))
# not_ins is done and tested
contains = [letter for letter, _ in not_ins]
contains = "".join(contains)
print(contains)
filtered_words = [word for word in filtered_words if all(letter in word for letter in contains)]
print(len(filtered_words))
for not_in in not_ins:
 filtered_words = [word for word in filtered_words if word[not_in[1]] != not_in[0]]
print(len(filtered_words))
# in_place done and tested
filtered_words = [word for word in filtered_words if all(word[i] == letter for letter, i in in_place)]
print(len(filtered_words))
filtered_words[:10]

And the result:


in_place []
not_ins [('u', 0), ('s', 1), ('s', 4)]
forbidden_letters er
43
uss
43
25
25
['flush',
 'plush',
 'skull',
 'skunk',
 'swung',
 'snuff',
 'spunk',
 'slush',
 'slung',
 'shull']

You can keep running all the solver until all is green.

Question 2

Documentation

It is good that you added docstrings for your functions. It would also be good to add a docstring at the top of your code to:

Summarize its purpose
Describe the expected input file
Explain the expected output

The csv_to_dict function doctring should also explain the format of the input CSV file. I don't have access to that "kaggle" site, so I can't see the CSV file.

See also: Writing Docstrings — The Hitchhiker's Guide to Python.

Comments

This end-of-line comment can be deleted since it merely repeats what the code already says:

with open(file_path, 'r', newline='', encoding='utf-8') as file: # Added encoding

You should delete all commented-out code to reduce clutter:

#print(row)

UX

You should add some text to the output to explain what the user is looking at.

For example:

print(len(fl_words))

could be:

print(f'Number of words: {len(fl_words)}')

Efficiency

These separate if statements in the "solver" code:

if feedback == 'G':
if feedback == 'Y':
if feedback == 'R':

should be combined into a single if/else statement:

if feedback == 'G':
elif feedback == 'Y':
elif feedback == 'R':

The checks are mutually exclusive. This makes the code more efficient since you don't have to perform the 2nd check if the first is true, etc. Also, this more clearly shows the intent of the code.

Layout

Its not clear from the question if the 3 codes blocks are in 1 file or separate files. If they are in 1 file, the functions should be together after the import line. Having them in the middle of the code interrupts the natural flow of the code (from a human readability standpoint).

Naming

The variable named fl_words is a bit vague. You should explain "fl", either as a comment or just rename the variable.

Question 3

The multiple checks against the value of feedback might also be a match.

Question 4

@Chris: I think you are saying that a match/case statement could be used. If so, I agree.

Question 5

simple boolean expression

def is_redundant_letters(text):
 """
 Detects and returns redundant letters in a string.
 ...

Seven lines of code to implement that seems like a lot. Plus the docstring isn't quite accurate, as a boolean-typed function never returns a str corresponding to a redundant letter.

This would suffice:

def has_redundant_letters(text: str) -> bool:
 return len(text) > len(set(text))

plural

def csv_to_dict(file_path):
 """Reads a CSV file and converts it into a list of dictionaries.
 ...

Better to call it csv_to_dicts.

The signature would benefit from a -> list[dict[str, int]] annotation.

mishandled errors

 except FileNotFoundError:
 print(f"Error: File '{file_path}' not found.")
 except Exception as e:
 print(f"An error occurred: {e}")
 return data

DbC explains that it would be much better to either honor the contract (we promised a list of dicts) or bail out with fatal error.

That is, neither except clause is helpful. Yes, we do return a (zero length) container in the error case, and yes the docstring does mention that, but it's not helpful, it's not like we've accomplished something useful in the error case.

comments lie

Oh, wait! The docstring keeps talking about a list, yet data is a dict. Sigh!

Question 6

Sorry about the docstring that is not updated and mismatched with the code. I posted the revised one based on your input. Thank you very much. I would appreciate a second round of review.

Question 7

Thank you for the input from @toolic and @J_H. A few lines from the unigram.csv:

word	count
the	23135851162
of	13151942776
and	12997637966
to	12136980858
a	9081174698
in	8469404971
for	5933321709
is	4705743816
on	3750423199
that	3400031103
by	3350048871
this	3228469771
with	3183110675
i	3086225277

Here is my revised version.

""" 
Wordle solver
It will load the English words with its frequency from Kaggle dataset, filtered
only 5 letter words when read it line by line.
The input will the guess word and its colored coded feedback from Wordle
The output will be a list of suggested words to guess next
You can call solve many time until all words are guessed.
"""
import csv
def csv_to_dict(file_path):
 """Reads a CSV file and converts it into a dictionary.
 Args:
 file_path: The path to the CSV file.
 Returns:
 A dictionary, where key is the word and value is the count. 
 All the words are five-letter word and in lower case.
 Returns an empty dictionary if the file is not found or if an error occurs.
 """
 data = {}
 try:
 with open(file_path, 'r', newline='', encoding='utf-8') as file: 
 reader = csv.DictReader(file)
 for row in reader:
 #print(row)
 word = row['word']
 if len(word) == 5:
 data[word] = row['count']
 except Exception as e:
 print(f"An error occurred: {e}")
 return {}
 return data
# The data is from 
# https://www.kaggle.com/datasets/rtatman/english-word-frequency?resource=download
file_path = '/content/drive/MyDrive/data/unigram_freq.csv'
result_dict = csv_to_dict(file_path)
filtered_words = list(result_dict.keys())
print("The total five letter words: ", len(filtered_words))
print("The top ten most frequent five letter word", filtered_words[:10])
# Generating the initial guess where there is no redundant letter and has at
# least three vowels
# Remove the redundant letter
def has_redundant_letters(text: str) -> bool:
 return len(text) > len(set(text))
initial_words = [word for word in filtered_words 
 if not has_redundant_letters(word)]
# The guess words should have at least 3 vowels
initial_words = [word for word in initial_words[:100] 
 if len(set(word).intersection(set('aeiouy'))) > 2]
print("The top ten initial guess words: ", initial_words[:10])
def solve(guess_word, guess_feedback, filtered_words):
 """Solve the Wordle game.
 Args:
 Guess_word: The word guessed by the player.
 guess_feedback: The feedback given by the Wordle game.
 # R = GRAY - forbidden letters
 # G = GREEN - right letter on right position - in_place
 # Y = YELLOW - right letter wrong position - contains and not_in
 # Example: guess_feedback = 'RRRYR'
 filtered_words: The list of possible words
 Returns:
 A list of suggested words to guess next.
 Precondition: the filtered_words is a list of five letter words in lower case
 """
 forbidden_letters = ''
 in_place = [] 
 not_ins = []
 for idx, (letter, feedback) in enumerate(zip(guess_word, guess_feedback)):
 if feedback == 'G':
 in_place.append((letter, idx))
 elif feedback == 'Y':
 not_ins.append((letter, idx))
 elif feedback == 'R':
 forbidden_letters += letter
 # remove all the words with the forbidden letters
 filtered_words = [word for word in filtered_words 
 if set(word).isdisjoint(set(forbidden_letters))]
 # remove all the words not containing the correct letters in the right places
 filtered_words = [word for word in filtered_words 
 if all(word[i] == letter for letter, i in in_place)]
 # remove all the words not containing letter that is not in correct position
 contains = [letter for letter, _ in not_ins]
 contains = "".join(contains)
 filtered_words = [word for word in filtered_words 
 if all(letter in word for letter in contains)]
 # remove all the words not containing letter that is not in correct position
 for not_in in not_ins:
 filtered_words = [word for word in filtered_words 
 if word[not_in[1]] != not_in[0]]
 print("The top ten guess words: ", filtered_words[:10])
 return filtered_words
guess_word = 'about'
# R = GRAY - forbidden letters
# G = GREEN - right letter on right position - in_place
# Y = YELLOW - right letter wrong position - contains and not_in
guess_feedback = 'RRRYR'
# you can put the guess word and run SOLVE again until the feedback is all green
filtered_words = solve(guess_word, guess_feedback, filtered_words)
guess_word = 'music'
guess_feedback = 'RYYRR'
filtered_words = solve(guess_word, guess_feedback, filtered_words)
guess_word = 'users'
guess_feedback = 'YYRRY'
filtered_words = solve(guess_word, guess_feedback, filtered_words)
guess_word = 'slush'
guess_feedback = 'GRGGG'
filtered_words = solve(guess_word, guess_feedback, filtered_words)
guess_word = 'shush'
guess_feedback = 'GGGGG' #DONE

The result:

The total five letter words: 39933
The top ten most frequent five letter word ['about', 'other', 'which', 'their', 'there', 'first', 'would', 'these', 'click', 'price']
The top ten initial guess words: ['about', 'email', 'video', 'years', 'today', 'house', 'media', 'guide', 'image', 'money']
The top ten guess words: ['music', 'under', 'using', 'guide', 'users', 'rules', 'quick', 'super', 'pussy', 'funds']
The top ten guess words: ['users', 'drugs', 'flush', 'usher', 'plush', 'skull', 'plugs', 'reuse', 'urges', 'spurs']
The top ten guess words: ['flush', 'plush', 'skull', 'skunk', 'swung', 'snuff', 'spunk', 'slush', 'slung', 'shull']
The top ten guess words: ['shush', 'sqush']
The top ten guess words: []

Question 8

I don't think I can upload the data. Here is the view lines of the data: word count the 23135851162 of 13151942776 and 12997637966 to 12136980858 a 9081174698 in 8469404971 for 5933321709 is 4705743816 on 3750423199 that 3400031103 by 3350048871 this 3228469771 with 3183110675 i 3086225277

Question 9

I added an image to the answer. You can easily copy, paste, and format from the comment. Or download from Kaggle. I don't want to violate the copyright by copying and pasting the whole thing. Sharing the link is the way. kaggle.com/datasets/rtatman/…

toolic toolic 14.2k5 gold badges29 silver badges200 bronze badges · Accepted Answer · 2025-05-05 18:07:52Z

Documentation

It is good that you added docstrings for your functions. It would also be good to add a docstring at the top of your code to:

Summarize its purpose
Describe the expected input file
Explain the expected output

The csv_to_dict function doctring should also explain the format of the input CSV file. I don't have access to that "kaggle" site, so I can't see the CSV file.

See also: Writing Docstrings — The Hitchhiker's Guide to Python.

Comments

This end-of-line comment can be deleted since it merely repeats what the code already says:

with open(file_path, 'r', newline='', encoding='utf-8') as file: # Added encoding

You should delete all commented-out code to reduce clutter:

#print(row)

UX

You should add some text to the output to explain what the user is looking at.

For example:

print(len(fl_words))

could be:

print(f'Number of words: {len(fl_words)}')

Efficiency

These separate if statements in the "solver" code:

if feedback == 'G':
if feedback == 'Y':
if feedback == 'R':

should be combined into a single if/else statement:

if feedback == 'G':
elif feedback == 'Y':
elif feedback == 'R':

The checks are mutually exclusive. This makes the code more efficient since you don't have to perform the 2nd check if the first is true, etc. Also, this more clearly shows the intent of the code.

Layout

Its not clear from the question if the 3 codes blocks are in 1 file or separate files. If they are in 1 file, the functions should be together after the import line. Having them in the middle of the code interrupts the natural flow of the code (from a human readability standpoint).

Naming

The variable named fl_words is a bit vague. You should explain "fl", either as a comment or just rename the variable.

The multiple checks against the value of feedback might also be a match.
@Chris: I think you are saying that a match/case statement could be used. If so, I agree.

Stack Exchange Network

Solver for Wordle puzzle

3 Answers 3

Documentation

Comments

UX

Efficiency

Layout

Naming

simple boolean expression

plural

mishandled errors

comments lie

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Solver for Wordle puzzle

3 Answers 3

Documentation

Comments

UX

Efficiency

Layout

Naming

simple boolean expression

plural

mishandled errors

comments lie

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions