I've been working on a program to solve the game known as WordBrain. In this game, you have to swipe your finger across a square grid of letters and trace out words of given lengths in the given order. When you swipe the correct word, it is removed from the board and the remaining letters fall down into the empty spaces. Part of the challenge in writing this solver is that the letters of a word only have to be adjacent to each other. That is, the words don't necessarily have to be horizontal, vertical, or diagonal. Another challenge is the fact that the board is continually changing.
My Approach
My approach is just brute force. I have one recursive function findWords()
to find words on the board, and another recursive function findSolutions()
that uses those words to find complete solutions. I look for every possible permutation of words and solutions. I believe these are both DFS algorithms. I'm hoping that the code is fairly self-explanatory, but that is one of the reasons I'm here.
I decided to represent the game board as a string of characters. For example, "abcdefghi" might be a 3x3 board. To begin solving, findSolutions()
calls findWords()
, which in turn calls neighbors()
. Neighbors are defined as the indexes of letters adjacent to the letter we're currently looking at in findWords()
. A typical result of neighbors might be { 1, 3, 4, -1, -1, ... }
(the -1s are filler since I use a fixed size std::array
). findWords()
recurses over the neighbors until a word is found (or not). Note that a "word" is just another set of indexes.
findSolutions()
uses the same type of algorithm. Using a word passed in from the previous level of recursion, it removes that word from the board and calls findWords()
to find words on that new board. It then recurses over those words until it finds a complete solution. Solutions are packed into a vector of SolutionEntry
, which makes solutions easy to deal with in the Qt GUI.
I recently considered creating a Board
class which would hold a 2D vector representation of a board and a function to remove words from the board. Taking this approach might have been better than using a string to represent the board and using little "hacks" like row = index / mSize
and col = index % mSize
. But... I settled on the string approach in the beginning and stuck with that. The result isn't too bad, in my opinion, but I'm happy to hear what other people have to say.
Some Extra Notes
- For
findWords
, I use a trie data structure defined intrie.h
(not shown here) that makes it very convenient to test whether a set of letters is a valid prefix (e.g. qu is valid, qz is not), and whether it is a valid word at the end. I did not write this code. It is available here. It's not licensed. Unless the author intends to license his/her work, my project will remain a personal one on my computer. - The code here is part of a larger Qt GUI that I wrote, hence the
Q_OBJECT
,emit
, signals and slots, etc. - My use of a fixed-size
std::array
andmSizeSquared
forneighbors()
together resulted in about a 30% increase in performance vs. using a vector andmSize*mSize
. - I tried a different approach to
findWords()
once. Rather than looking for prefixes on the board and seeing if they exist in the dictionary, I tried the reverse. I looked at the words in the dictionary and checked whether they existed on the board. I thought this might reduce the search space. However, it was slower... Maybe my implementation was flawed.
Updates
- I removed the Qt dependencies from the code here so that hopefully anyone can compile it if they want. I use the command
g++ -o solver -O3 -std=gnu++11 -Wall solver.cpp main.cpp
. You will have to grabtrie.h
from here and a word list from here on GitHub. - Just yesterday, I tried to optimize the code a little by using a
std::set
with a custom compare functor to store words and remove duplicates, but I ended up losing some solutions that way. - I figured out how to store words in a
std::set
in a way that works now, although the functor seems pretty inefficient. It has good results though. On one hard puzzle, the old code produced 239,711 solutions. The new code produces 195,489. I verified that the lost solutions are duplicates. The time difference between the two versions is negligible. - Changed where I push back onto
mWord
. On harder puzzles, this results in millions or billions fewer calls topush_back
and no calls topop_back
. Hardly any savings as far as runtime, but I'll take it.
Questions
- Is the code readable and understandable?
- I do a LOT of string concatenation in
findWords
with the statementmPrefix += board[n];
and the results stand out in my profiling with callgrind. Isstd::string
as good as I can get here? - What can I do to improve performance besides attempting multi-threading?
- What improvements can I make to the code or algorithms in general?
solver.h
#ifndef SOLVER_H
#define SOLVER_H
#include <algorithm>
#include <array>
#include <memory>
#include <set>
#include <string>
#include <vector>
#include "solutionentry.h"
#include "trie.h"
class Solver
{
private:
typedef std::vector<unsigned int> tIntVec;
typedef std::vector<SolutionEntry> tSolution;
enum { NullIndex = -1 };
enum { NullNeighbor = -1, MaxNeighbors = 64 };
public:
Solver();
Solver(const std::string &board, int size, const tIntVec &wordlengths,
const std::string &wordfile);
~Solver();
bool solve();
void setBoard(const std::string &board)
{
mBoard = board;
}
void setSize(int size)
{
mSize = size;
mSizeSquared = size * size;
}
void setWordLengths(const tIntVec &wordlengths)
{
mWordLengths = wordlengths;
}
void setWordFile(const std::string &wordfile)
{
if (mWordFile != wordfile) {
mWordFile = wordfile;
}
}
tIntVec getWordLengths() {
return mWordLengths;
}
private:
struct WordCompare {
bool operator()(const std::pair<std::string, tIntVec> &lhs,
const std::pair<std::string, tIntVec> &rhs) const {
auto leftVector = lhs.second;
auto rightVector = rhs.second;
std::sort(leftVector.begin(), leftVector.end());
std::sort(rightVector.begin(), rightVector.end());
return (lhs.first != rhs.first) || (leftVector != rightVector);
}
};
bool findSolutions(const std::string &board, const tIntVec &wordpath,
unsigned int depth = 0);
void findWords(const std::string &board, int index,
unsigned int wordlength, unsigned int depth = 0);
std::array<int, MaxNeighbors>
neighbors(const std::string &board, int index) const;
std::string removeWord(const std::string &board, tIntVec word) const;
bool initializeDictionary();
void cleanUp();
void printSolution() const;
// Board parameters
std::string mBoard;
unsigned int mSize;
tIntVec mWordLengths;
std::string mWordFile;
unsigned int mSizeSquared;
// Solver variables
tSolution mSolution;
tIntVec mWord;
std::string mPrefix;
std::set<std::pair<std::string, tIntVec>, WordCompare> mWordList;
std::unique_ptr<trie_t> mDictionary;
const int mNeighborDx[8];
const int mNeighborDy[8];
};
#endif // SOLVER_H
solver.cpp
#include <algorithm>
#include <cctype>
#include <chrono>
#include <fstream>
#include <iostream>
#include <numeric>
#include <sstream>
#include <string>
#include <thread>
#include "solutionentry.h"
#include "trie.h"
#include "solver.h"
Solver::Solver()
: mDictionary(nullptr),
mNeighborDx{ -1, 0, 1, -1, 1, -1, 0, 1 },
mNeighborDy{ -1, -1, -1, 0, 0, 1, 1, 1 }
{ }
Solver::Solver(const std::string &board, int size, const tIntVec &wordlengths,
const std::string &wordfile)
: mBoard(board),
mSize(size),
mWordLengths(wordlengths),
mWordFile(wordfile),
mDictionary(nullptr),
mNeighborDx{ -1, 0, 1, -1, 1, -1, 0, 1 },
mNeighborDy{ -1, -1, -1, 0, 0, 1, 1, 1 }
{
mSizeSquared = mSize * mSize;
}
Solver::~Solver()
{
cleanUp();
}
bool Solver::solve()
{
if (!initializeDictionary()) {
std::cerr << "Unable to initialize the dictionary\n";
return false;
}
cleanUp();
std::chrono::time_point<std::chrono::system_clock> start, end;
start = std::chrono::system_clock::now();
findSolutions(mBoard, tIntVec());
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end - start;
std::cout << elapsed_seconds.count() << '\n';
return true;
}
bool Solver::findSolutions(const std::string &board, const Solver::tIntVec &wordpath,
unsigned int depth)
{
// A complete solution has been found
if (depth == mWordLengths.size()) {
printSolution();
mSolution.pop_back();
return true;
}
// Remove current word from the board
std::string newBoard = removeWord(board, wordpath);
// Find any words that may exist on the new board
mWordList.clear();
findWords(newBoard, Solver::NullIndex, mWordLengths[depth]);
auto currentWords = mWordList;
for (const auto &word : currentWords) {
mSolution.push_back(SolutionEntry(word.second, newBoard, depth));
findSolutions(newBoard, word.second, depth + 1);
}
if (!mSolution.empty()) {
mSolution.pop_back();
}
return true;
}
void Solver::findWords(const std::string &board, int index,
unsigned int wordlength, unsigned int depth)
{
// We have a valid word
if (depth == wordlength && mDictionary->isWord(mPrefix)) {
mWordList.insert(std::make_pair(mPrefix, mWord));
mWord.pop_back();
mPrefix.pop_back();
return;
}
for (const auto n : neighbors(board, index)) {
// Visit a neighbor only if we haven't encountered it before
if (n != Solver::NullNeighbor &&
std::find(mWord.begin(), mWord.end(), n) == mWord.end()) {
mPrefix += board[n];
if (mDictionary->isPrefix(mPrefix)) {
// Recursive step
mWord.push_back(n);
findWords(board, n, wordlength, depth + 1);
} else {
mPrefix.pop_back();
}
}
}
if (!mWord.empty()) {
mWord.pop_back();
}
if (!mPrefix.empty()) {
mPrefix.pop_back();
}
}
std::array<int, Solver::MaxNeighbors>
Solver::neighbors(const std::string &board, int index) const
{
std::array<int, Solver::MaxNeighbors> n{};
n.fill(Solver::NullNeighbor);
// Return consecutive integers from 0 to mSizeSquared
if (index == Solver::NullIndex) {
std::iota(n.begin(), n.begin() + mSizeSquared, 0);
return n;
}
// Otherwise, return the actual neighbors of the letter at index
int x, y, pos;
int row = index / mSize;
int col = index % mSize;
for (int i = 0, j = 0; i < 8; ++i) {
x = row + mNeighborDx[i];
y = col + mNeighborDy[i];
pos = x * mSize + y;
if (x >= 0 && static_cast<unsigned int>(x) < mSize &&
y >= 0 && static_cast<unsigned int>(y) < mSize && board[pos] != ' ') {
n[j] = pos;
++j;
}
}
return n;
}
std::string Solver::removeWord(const std::string &board, Solver::tIntVec word) const
{
std::string newBoard(board);
std::sort(word.begin(), word.end());
for (const auto &n : word) {
// Remove the letter at index n
newBoard[n] = ' ';
// Move the letters above n down one position
int index = n - mSize;
while (index >= 0) {
if (newBoard[index] == ' ') {
index -= mSize;
} else {
char tmp = newBoard[index];
newBoard[index] = ' ';
newBoard[index + mSize] = tmp;
index -= mSize;
}
}
}
return newBoard;
}
bool Solver::initializeDictionary()
{
std::ifstream inf(mWordFile);
if (inf.is_open()) {
mDictionary.reset(new trie_t);
std::string curWord;
while (inf >> curWord) {
std::transform(curWord.begin(), curWord.end(), curWord.begin(),
[](unsigned char ch){ return std::tolower(ch); });
if (curWord.find_first_not_of("abcdefghijklmnopqrstuvwxyz") == std::string::npos) {
for (const auto &length : mWordLengths) {
// Only add words of the length we'll be working with
if (curWord.length() == length) {
mDictionary->addWord(curWord);
break;
}
}
}
}
inf.close();
return true;
} else {
std::cerr << "Unable to open file " << mWordFile;
return false;
}
}
void Solver::cleanUp()
{
mSolution.clear();
mWord.clear();
mWordList.clear();
mPrefix.clear();
}
void Solver::printSolution() const
{
for (const auto &s : mSolution) {
std::cout << s.getWord() << " ";
}
std::cout << '\n';
}
solutionentry.h
#ifndef SOLUTIONENTRY_H
#define SOLUTIONENTRY_H
#include <string>
#include <vector>
/*
* This class collects useful information about each word in
* the solution lists: mPath contains the indexes of a word on
* the board, mBoard contains the board state as it was when
* the word was found, and mIndex contains the index (or level)
* at which the word exists.
*/
class SolutionEntry {
public:
SolutionEntry(const std::vector<unsigned int> &path, const std::string &board, unsigned int index)
: mPath(path), mBoard(board), mIndex(index) {}
//~SolutionEntry() { }
std::vector<unsigned int> getPath() const
{
return mPath;
}
std::string getBoard() const
{
return mBoard;
}
unsigned int getIndex() const
{
return mIndex;
}
std::string getWord() const
{
std::string word;
if (!mPath.empty()) {
for (const auto &p : mPath) {
word += mBoard.at(p);
}
}
return word;
}
private:
std::vector<unsigned int> mPath;
std::string mBoard;
unsigned int mIndex;
};
#endif // SOLUTIONENTRY_H
main.cpp
#include <memory>
#include "solver.h"
int main(int argc, char *argv[])
{
int size = 6;
std::string board("ulngtuckrheconelrrlccutadpyitbiastos");
std::vector<unsigned int> lengths = { 4, 7, 6, 5, 8, 6 };
std::string wordfile = "enable1.txt";
std::unique_ptr<Solver> mySolver(new Solver(board, size, lengths, wordfile));
mySolver->solve();
return 0;
}
1 Answer 1
I see some things that may help you improve your code.
Use a better trie implementation
Strictly speaking, it's not part of the code to be reviewed, but there are a number of easy improvements that could be made to the trie code with which you're linking. In particular, every function that takes a std::string
should instead take a const std::string &
argument. Also instead of map
, use unordered_map
.
Separate interface from implementation
It makes the code somewhat longer for a code review, but it's often very useful to separate the interface from the implementation. In C++, this is usually done by putting the interface into separate .h
files and the corresponding implementation into .cpp
files. It helps users (or reviewers) of the code see and understand the interface and hides implementation details. The other important reason is that you might have multiple source files including the .h
file but only one instance of the corresponding .cpp
file. In other words, split your existing solutionentry.h
file into a .h
file and a .cpp
file.
Avoid needless complication
The main
routine contains these two lines:
std::unique_ptr<Solver> mySolver(new Solver(board, size, lengths, wordfile));
mySolver->solve();
However, I'm left with two questions on that. First, do we really need new
and a unique_ptr
? Second, is there anything useful one could do with a Solver
instance other than calling solve()
? I'd suggest that those two lines could instead be written like this:
Solver mySolver{board, size, lengths, wordfile};
mySolver();
This assumes that there is an operator()
defined instead of solve
.
Use static const
members where appropriate
The mNeighborDx
and mNeighborDy
data items are const
but really ought to be declared as static const
because they're never altered and don't need anything special in the constructor. Better, make them static constexpr
.
Let the compiler write code where it can
If you make the mNeighborDx
and mNeighborDy
members constexpr
as mentioned above, both the default (empty) constructor and destructor can be defaults. Delete the code for both and simply write:
Solver() = default;
~Solver() = default;
This also renders the cleanUp
function obsolete; it may be deleted.
Minimize the work done
The initializeDictionary()
code could be made more efficient. Currently, it transforms the word to lowercase, checks to see that all of the letters are alphabetic and then cycles through the lengths. There are some inefficiencies in that. Instead, we could compare the length first and only do further processing if it's one of the lengths in our array. Then we can do the transform and other checks, rejecting too short or too long words early to avoid doing extra work on words that will be discarded anyway.
Use standard algorithms
The removeWord
routine can be shorter and simpler:
std::string Solver::removeWord(std::string newBoard, Solver::tIntVec word) const
{
std::sort(word.begin(), word.end());
for (auto n : word) {
// slide everything above it down one space
for (newBoard[n] = ' '; n >= mSize; n -= mSize) {
std::swap(newBoard[n], newBoard[n-mSize]);
}
}
return newBoard;
}
This passes newBoard
by value, so a local copy is automatically made. Next, we sort the letter indices and then use std::swap
to cause the blanks to "bubble up" to the top. Note that std::swap
is both easier to read and understand and shorter.
Trim the dictionary
Not every word in the dictionary is going to be possible to find in the board. The code currently already only adds words that are the right lengths, but there's a better way to do it. The idea is if, for example, the potential dictionary word is "zoo" but there is no letter "z" in the initial board, there's no point in adding the word. This code expands on that notion by comparing the sorted anagrams of each string. Here's the alternative to the initializeDictionary
code with supporting functions:
static std::string anagram(std::string anagram) {
std::sort(anagram.begin(), anagram.end());
return anagram;
}
static bool possible(const std::string &haystack, const std::string &needle) {
// start from the beginning
auto curpos{haystack.cbegin()};
auto end{haystack.cend()};
for (auto ch : needle) {
if ((curpos = std::find(curpos, end, ch)) == end) {
return false;
}
++curpos;
}
return true;
}
bool Solver::initializeDictionary()
{
std::ifstream inf(mWordFile);
if (!inf) {
std::cerr << "Unable to open file " << mWordFile;
return false;
}
const auto boardAnagram = anagram(mBoard);
mDictionary.reset(new trie_t);
std::string curWord;
while (inf >> curWord) {
// Only add words of the length we'll be working with
if (std::find(mWordLengths.begin(), mWordLengths.end(), curWord.length()) != mWordLengths.end()) {
std::transform(curWord.begin(), curWord.end(), curWord.begin(),
[](unsigned char ch){ return std::tolower(ch); });
if (curWord.find_first_not_of("abcdefghijklmnopqrstuvwxyz") == std::string::npos) {
// Only add words that might be possible
if (possible(boardAnagram, anagram(curWord))) {
mDictionary->addWord(curWord);
}
}
}
}
return true;
}
Have each function do just one thing
What does the findWords
routine do? Yes, it finds words, but it also updates the mWord
variable, the mPrefix
variable, may or may not add to the mWordList
structure and recursively calls itself until the passed wordlength
and passed depth
are equal and the current mPrefix
is in the dictionary. That's rather a lot and makes it difficult to either understand or update the code. Instead, I'd suggest decomposing such compound functions into more easily describable ones named such that it's easy to see perceive the algorithm just by reading the code. Naturally, that's easier said than done, but it's still a worthwhile goal.
Remove unused functions
There are various solver
member functions which are unused and seem not to be very useful anyway, such as setBoard
, setSize
, etc. I'd recommend only adding functions as they are needed. The best interface is one that is minimal but sufficient.
Rethink the code
A trie is a good structure to use for a problem like this, but I think there are many improvements possible in the efficiency of the algorithm and its implementation. What follows are a number of specific suggestions for how to do that.
Create one trie per unique word size
If what the algorithm is seeking is a 5-letter word, it would be most useful to be able to refer to a trie that only had 5-letter word candidates in it. This trades off memory size for speed; profiling will tell you whether this approach is a worthwhile one on your machine.
Avoid repeatedly recreating larger structures
The neighbors
routine is called a lot. My profiling of the code shows that just that routine takes about 20% of the total execution time. Instead, why not just create a neighbor
function that, given an offset and neighbor number from 0 through 7, returns either neighbor index (or NullIndex
if the particular neighbor is either empty or off the board)? This would simplify the code and reduce the memory burden.
Avoid dynamic resizing
When the program searches at each step for words of a particular size, we already know in advance the maximum size of the string. To avoid dynamic resizing, you can use reserve()
to make sure each relevant structure has the right size, avoiding dynamic resizing and gaining a performance benefit.
Profile the code
By running tests and profiling the code, you'll be able to see which routines are called most frequently, which ones use the most time, etc. All of these things help you assure you spend your time on the things that matter most, thereby optimizing both the code and your time at the same stroke.
-
\$\begingroup\$ There's one tiny detail I'm not clear on. You said that
static constexpr
is better to use thanstatic const
in my case. Are the constructor and destructor not able to be defaults withoutconstexpr
? Is there any other benefit to usingconstexpr
for this case? Also, thanks for all your feedback. It's very useful and has given me a lot to work on! \$\endgroup\$Marc Payne– Marc Payne2017年10月02日 15:48:07 +00:00Commented Oct 2, 2017 at 15:48 -
1\$\begingroup\$ Glad to be of help. Technically, either
const
orconstexpr
works here (and will likely result in identical object code in this case). The advantage is that if there were any calculation involved, declaring itconstexpr
assures that the calculation is done at compile-time rather than at runtime. \$\endgroup\$Edward– Edward2017年10月02日 15:50:35 +00:00Commented Oct 2, 2017 at 15:50 -
\$\begingroup\$ I've implemented most of what you recommended. Lots of very clever suggestions. Creating one unique trie per word size actually had the most noticeable effect on performance on my end. I'm struggling with your idea about
neighbors
, though. Would a question about that be suitable for stackoverflow? \$\endgroup\$Marc Payne– Marc Payne2017年10月06日 20:44:48 +00:00Commented Oct 6, 2017 at 20:44
Explore related questions
See similar questions with these tags.