Winner found
It seems as if we have a winner! Unless anyone plans on contesting the world's current fastest Sudoku solver, user 53x15 wins with the staggeringly fast solver Tdoku. For anyone still working on their solvers, I'll still benchmark new submissions when I have time.
The challenge
The goal of a game of Sudoku is to fill the board with the numbers 1-9, one in each cell, in such a way that each row, column and box only contains each number once. A very important aspect of a Sudoku puzzle is that there should only be one valid solution.
The goal of this challenge is simple, you should solve a set of Sudoku puzzles as fast as possible. However, you won't just be solving any old Sudoku, you'll be solving the very hardest Sudoku puzzles in existence, the 17-clue Sudokus. Here's an example:
Rules
Language
You're free to use any language. If I don't have a compiler installed for your language, you should be able to provide a set of command line instructions needed to install an environment where your script can be run on Linux.
Benchmark machine
The benchmark will be run on a Dell XPS 9560, 2.8GHz Intel Core i7-7700HQ (3.8GHz boost) 4 cores, 8 threads, 16GB RAM. GTX 1050 4GB. The machine runs Ubuntu 19.04. Here's the uname output, for anyone interested.
Linux 5.0.0-25-generic #26-Ubuntu SMP Thu Aug 1 12:04:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Input
The input will be given as a file. It can be found here . The file contains 49151 Sudoku puzzles. The first line of the file is the number of puzzles, and every line after that is 81 characters long and represents a puzzle. The unknown cells are 0, and the known cells are 1-9.
Your program should be able to take the filename as an argument, or have the file input from STDIN, to facilitate manual checking of your solution. Please include an instruction for how your program takes input.
Timing / scoring
From discussions in the comments, and some reflection, the scoring criteria has been changed to be the time of your entire program. Your program should produce the output file with the correct hash even during official scoring. This doesn't interfere with any existing solution, and doesn't change the rankings as they stand now. Any thoughts on the scoring system are appreciated.
If two solutions have similar scores for individual runs, I will run multiple benchmarks, and the average time will be the final score. If the average scores differ by less than 2%, I will consider it a draw.
If your solution takes longer than an hour to run, it will not be officially scored. In those cases, you are responsible for reporting the machine on which it ran, and your score. For an optimized solver, this should not be an issue.
EDIT: It was brought to my attention that while difficult, the problem set at hand is not the most difficult there is. If time is available, I'll try to benchmark the solutions presented here against the harder puzzle set, and add the score to each submission. However, this will not be an official scoring, and is just for fun.
Verification
Your solution will be verified by a MD5/SHA256 checksum. Your script should be able to generate a file containing all puzzles and their solutions. However, the file will also be manually inspected, so don't try to get a hash collision. Your output file should match:
MD5: 41704fd7d8fd0723a45ffbb2dbbfa488
SHA256: 0bc8dda364db7b99f389b42383e37b411d9fa022204d124cb3c8959eba252f05
The file will be on the format:
<num_puzzles>
<unsolved_puzzle#1>,<solved_puzzle#1>
<unsolved_puzzle#2>,<solved_puzzle#2>
...
<unsolved_puzzle#n>,<solved_puzzle#n>
with a single trailing newline.
What's not allowed
You are in no way allowed to hard-code solutions. Your algorithm should be applicable on any set of Sudoku puzzles, both easy and harder Sudokus. However, it is entirely fine if your solution is slow for easier puzzles.
You are not allowed to have a non-deterministic program. You are allowed to use a random number generator, but the seed of the generator should be fixed. This rule is to ensure that measurements are more precise, and have less variance. (Thanks to Peter Taylor for the tip)
You are not allowed to use any external resources or web requests during the runtime of your program. Everything should be self-contained. This does not apply to installed libraries and packages, which are allowed.
Other info
If you want another test set to check your solution, here are 10000 easier Sudokus. Here are their solutions.
MD5: 3cb465ef6077c4fcab5bd6ae3bc50d62
SHA256: 0bc8dda364db7b99f389b42383e37b411d9fa022204d124cb3c8959eba252f05
If you have any questions, feel free to ask, and I'll try to clarify any misunderstandings.
-
\$\begingroup\$ I have an APL+WIN solver but unless you have a copy of the interpreter on your machine you will have to count me out. For info your hard example took 30ms and the first easy example 16ms. \$\endgroup\$Graham– Graham2019年08月23日 17:24:22 +00:00Commented Aug 23, 2019 at 17:24
-
\$\begingroup\$ @Graham it took 30ms for all 49151 sudokus, or 30ms on average? \$\endgroup\$maxb– maxb2019年08月23日 17:55:20 +00:00Commented Aug 23, 2019 at 17:55
-
1\$\begingroup\$ Should the entries also be code golfed? Or... Can they be golfed, for the fun it? ;-) \$\endgroup\$The Matt– The Matt2019年08月24日 03:28:57 +00:00Commented Aug 24, 2019 at 3:28
-
2\$\begingroup\$ @TheMatt I'd prefer non-golfed, just so I can verify that nothing fishy is going on \$\endgroup\$maxb– maxb2019年08月24日 06:26:30 +00:00Commented Aug 24, 2019 at 6:26
-
2\$\begingroup\$ In case you need the world's hardest sudoku: abcnews.go.com/blogs/headlines/2012/06/… \$\endgroup\$Jerry Jeremiah– Jerry Jeremiah2023年03月15日 01:01:48 +00:00Commented Mar 15, 2023 at 1:01
14 Answers 14
C++ - 0.201s official score
Using Tdoku (code; design; benchmarks) gives these results:
~/tdoku$ lscpu | grep Model.name Model name: Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz ~/tdoku$ # build: ~/tdoku$ CC=clang-8 CXX=clang++-8 ./BUILD.sh ~/tdoku$ clang -o solve example/solve.c build/libtdoku.a ~/tdoku$ # adjust input format: ~/tdoku$ sed -e "s/0/./g" all_17_clue_sudokus.txt > all_17_clue_sudokus.txt.in ~/tdoku$ # solve: ~/tdoku$ time ./solve 1 < all_17_clue_sudokus.txt.in > out.txt real 0m0.241s user 0m0.229s sys 0m0.012s ~/tdoku$ # adjust output format and sha256sum: ~/tdoku$ grep -v "^:0:$" out.txt | sed -e "s/:1:/,/" | tr . 0 | sha256sum 0bc8dda364db7b99f389b42383e37b411d9fa022204d124cb3c8959eba252f05 -
Tdoku has been optimized for hard Sudoku instances. But note, contrary to the problem statement, that 17 clue puzzles are far from the hardest Sudoku. Actually they're among the easiest, with the majority requiring no backtracking at all. See some of the other benchmark datasets in the Tdoku project for puzzles that are actually hard.
Also note that while Tdoku is the fastest solver I'm aware of for hard puzzles, it's not the fastest for 17 clue puzzles. For these I think the fastest is this rust project, a derivative of JCZSolve, which was optimized for 17 clue puzzles during development. Depending on the platform it might be 5-25% faster than Tdoku for these puzzles.
-
\$\begingroup\$ Wow, that was an interesting read about the implementation and theory behind it. Before I started this challenge, I wanted to find state of the art solvers and datasets. I guess I didn't look hard enough. From popular "scientific" articles, the 17 clue puzzles were all that's ever talked about, so it was my assumption that those were the hardest. I'll try to run all submissions against the data sets presented in your article, and I'll benchmark your submission later today. Fantastic job! \$\endgroup\$maxb– maxb2019年08月30日 04:36:58 +00:00Commented Aug 30, 2019 at 4:36
-
\$\begingroup\$ Thanks! You see from the article that finding state-of-the-art solving took me on a long journey. :-) I get why people focus on 17 clue puzzles: the dataset is well known, well defined, complete or nearly so, moderately large, and hard for naive solvers. While it's interesting to study harder puzzles, hardness is tricky to formalize. e.g., do we mean subjectively or empirically hard for humans based on the techniques required? do we mean slow on average for a given solver under random permutations? Do we mean minimum backdoor size under a formula with chosen pigeonhole inferences? etc. \$\endgroup\$53x15– 53x152019年08月30日 21:14:10 +00:00Commented Aug 30, 2019 at 21:14
-
1\$\begingroup\$ 0.201s - is that the time taken to solve all ~49k puzzles, roughly 4usec per puzzle? \$\endgroup\$Maxim Egorushkin– Maxim Egorushkin2020年08月27日 22:10:18 +00:00Commented Aug 27, 2020 at 22:10
-
\$\begingroup\$ Yes, that's right. \$\endgroup\$53x15– 53x152020年08月28日 15:57:25 +00:00Commented Aug 28, 2020 at 15:57
C - 0.045s unofficial score
I got this time on my i7-9750H with 6 cores 12 threads @ 4Ghz. I'm aware that my cpu is faster than the i7-7700HQ so I think (hope) it would be closer to 0.080s if it were officially scored.
Compile with: gcc file.c -O3 -march=native -fopenmp
The program takes the input file name as argument. It generates an output.txt file containing all sudokus with their solution. It can be significantly slower when the output file already exists.
This won't run on CPUs that don't support the SSE4.1 and AVX2 instructions set extensions.
#include <stdio.h>
#include <omp.h>
#include <stdbool.h>
#include <intrin.h>
// lookup tables that may or may not speed things up by avoiding division
static const unsigned char box_index[81] = {
0, 0, 0, 1, 1, 1, 2, 2, 2,
0, 0, 0, 1, 1, 1, 2, 2, 2,
0, 0, 0, 1, 1, 1, 2, 2, 2,
3, 3, 3, 4, 4, 4, 5, 5, 5,
3, 3, 3, 4, 4, 4, 5, 5, 5,
3, 3, 3, 4, 4, 4, 5, 5, 5,
6, 6, 6, 7, 7, 7, 8, 8, 8,
6, 6, 6, 7, 7, 7, 8, 8, 8,
6, 6, 6, 7, 7, 7, 8, 8, 8
};
static const unsigned char column_index[81] = {
0, 1, 2, 3, 4, 5, 6, 7, 8,
0, 1, 2, 3, 4, 5, 6, 7, 8,
0, 1, 2, 3, 4, 5, 6, 7, 8,
0, 1, 2, 3, 4, 5, 6, 7, 8,
0, 1, 2, 3, 4, 5, 6, 7, 8,
0, 1, 2, 3, 4, 5, 6, 7, 8,
0, 1, 2, 3, 4, 5, 6, 7, 8,
0, 1, 2, 3, 4, 5, 6, 7, 8,
0, 1, 2, 3, 4, 5, 6, 7, 8,
};
static const unsigned char row_index[81] = {
0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 7, 7, 7,
8, 8, 8, 8, 8, 8, 8, 8, 8,
};
static const unsigned char box_start[81] = {
0, 0, 0, 3, 3, 3, 6, 6, 6,
0, 0, 0, 3, 3, 3, 6, 6, 6,
0, 0, 0, 3, 3, 3, 6, 6, 6,
27, 27, 27, 30, 30, 30, 33, 33, 33,
27, 27, 27, 30, 30, 30, 33, 33, 33,
27, 27, 27, 30, 30, 30, 33, 33, 33,
54, 54, 54, 57, 57, 57, 60, 60, 60,
54, 54, 54, 57, 57, 57, 60, 60, 60,
54, 54, 54, 57, 57, 57, 60, 60, 60
};
static void add_column_indices(unsigned long long indices[2], unsigned char i) {
indices[0] |= 0x8040201008040201ULL << column_index[i];
indices[1] |= 0x8040201008040201ULL >> (10-column_index[i]);
}
static void add_row_indices(unsigned long long indices[2], unsigned char i) {
switch (row_index[i]) {
case 7:
indices[0] |= 0x8000000000000000ULL;
indices[1] |= 0xffULL;
break;
case 8:
indices[1] |= 0x01ff00ULL;
break;
default:
indices[0] |= 0x01ffULL << 9*row_index[i];
}
}
static void add_box_indices(unsigned long long indices[2], unsigned char i) {
indices[0] |= 0x1c0e07ULL << box_start[i];
indices[1] |= 0x0381c0e0ULL >> (60-box_start[i]);
}
struct GridState {
struct GridState* prev; // last grid state before a guess was made, used for backtracking
unsigned long long unlocked[2]; // for keeping track of which cells don't need to be looked at anymore. Set bits correspond to cells that still have multiple possibilities
unsigned long long updated[2]; // for keeping track of which cell's candidates may have been changed since last time we looked for naked sets. Set bits correspond to changed candidates in these cells
unsigned short candidates[81]; // which digits can go in this cell? Set bits correspond to possible digits
};
static void enter_digit(struct GridState* grid_state, signed char digit, unsigned char i) {
// lock this cell and and remove this digit from the candidates in this row, column and box
unsigned short* candidates = grid_state->candidates;
unsigned long long* unlocked = grid_state->unlocked;
unsigned long long to_update[2] = {0};
if (i < 64) {
unlocked[0] &= ~(1ULL << i);
} else {
unlocked[1] &= ~(1ULL << (i-64));
}
candidates[i] = 1 << digit;
add_box_indices(to_update, i);
add_column_indices(to_update, i);
add_row_indices(to_update, i);
to_update[0] &= unlocked[0];
to_update[1] &= unlocked[1];
grid_state->updated[0] |= to_update[0];
grid_state->updated[1] |= to_update[1];
const __m256i bit_mask = _mm256_setr_epi16(1<<0, 1<<1, 1<<2, 1<<3, 1<<4, 1<<5, 1<<6, 1<<7, 1<<8, 1<<9, 1<<10, 1<<11, 1<<12, 1<<13, 1<<14, 1<<15);
const __m256i mask = _mm256_set1_epi16(~candidates[i]);
for (unsigned char j = 0; j < 80; j += 16) {
unsigned short m;
if (j < 64) {
m = (unsigned short) ((to_update[0] >> j) & 0xffff);
} else {
m = (unsigned short) (to_update[1] & 0xffff);
}
__m256i c = _mm256_loadu_si256((__m256i*) &candidates[j]);
__m256i u = _mm256_cmpeq_epi16(_mm256_and_si256(bit_mask, _mm256_set1_epi16(m)), _mm256_setzero_si256());
c = _mm256_and_si256(c, _mm256_or_si256(mask, u));
_mm256_storeu_si256((__m256i*) &candidates[j], c);
}
if ((to_update[1] & (1ULL << (80-64))) != 0) {
candidates[80] &= ~candidates[i];
}
}
static long long guesses = 0;
static struct GridState* make_guess(struct GridState* grid_state) {
// Make a guess for the cell with the least candidates. The guess will be the lowest
// possible digit for that cell. If multiple cells have the same number of candidates, the
// cell with lowest index will be chosen. Also save the current grid state for tracking back
// in case the guess is wrong. No cell has less than two candidates.
unsigned long long* unlocked = grid_state->unlocked;
unsigned short* candidates = grid_state->candidates;
// Find the cell with fewest possible candidates
unsigned long long to_visit;
unsigned long guess_index = 0;
unsigned long i_rel;
unsigned short cnt;
unsigned short best_cnt = 16;
to_visit = unlocked[0];
while (_BitScanForward64(&i_rel, to_visit) != 0) {
to_visit &= ~(1ULL << i_rel);
cnt = __popcnt16(candidates[i_rel]);
if (cnt < best_cnt) {
best_cnt = cnt;
guess_index = i_rel;
}
}
to_visit = unlocked[1];
while (_BitScanForward64(&i_rel, to_visit) != 0) {
to_visit &= ~(1ULL << i_rel);
cnt = __popcnt16(candidates[i_rel + 64]);
if (cnt < best_cnt) {
best_cnt = cnt;
guess_index = i_rel + 64;
}
}
// Find the first candidate in this cell
unsigned long digit;
_BitScanReverse(&digit, candidates[guess_index]);
// Create a copy of the state of the grid to make back tracking possible
struct GridState* new_grid_state = (struct GridState*) malloc(sizeof(struct GridState));
memcpy(new_grid_state, grid_state, sizeof(struct GridState));
new_grid_state->prev = grid_state;
// Remove the guessed candidate from the old grid because if we need to get back to the old grid
// we know the guess was wrong
grid_state->candidates[guess_index] &= ~(1 << digit);
if (guess_index < 64) {
grid_state->updated[0] |= 1ULL << guess_index;
} else {
grid_state->updated[1] |= 1ULL << (guess_index-64);
}
// Update candidates
enter_digit(new_grid_state, (signed char) digit, (unsigned char) guess_index);
guesses++;
return new_grid_state;
}
static struct GridState* track_back(struct GridState* grid_state) {
// Go back to the state when the last guess was made
// This state had the guess removed as candidate from it's cell
struct GridState* old_grid_state = grid_state->prev;
free(grid_state);
return old_grid_state;
}
static bool solve(signed char grid[81]) {
struct GridState* grid_state = (struct GridState*) malloc(sizeof(struct GridState));
grid_state->prev = 0;
unsigned long long* unlocked = grid_state->unlocked;
unsigned long long* updated = grid_state->updated;
unsigned short* candidates = grid_state->candidates;
unlocked[0] = 0xffffffffffffffffULL;
unlocked[1] = 0x1ffffULL;
updated[0] = unlocked[0];
updated[1] = unlocked[1];
{
signed char digit;
unsigned short columns[9] = {0};
unsigned short rows[9] = {0};
unsigned short boxes[9] = {0};
for (unsigned char i = 0; i < 64; ++i) {
digit = grid[i];
if (digit >= 49) {
columns[column_index[i]] |= 1 << (digit-49);
rows[row_index[i]] |= 1 << (digit-49);
boxes[box_index[i]] |= 1 << (digit-49);
unlocked[0] &= ~(1ULL << i);
}
}
for (unsigned char i = 64; i < 81; ++i) {
digit = grid[i];
if (digit >= 49) {
columns[column_index[i]] |= 1 << (digit-49);
rows[row_index[i]] |= 1 << (digit-49);
boxes[box_index[i]] |= 1 << (digit-49);
unlocked[1] &= ~(1ULL << (i-64));
}
}
for (unsigned char i = 0; i < 81; ++i) {
if (grid[i] < 49) {
candidates[i] = 0x01ff ^ (rows[row_index[i]] | columns[column_index[i]] | boxes[box_index[i]]);
} else {
candidates[i] = 1 << (grid[i]-49);
}
}
}
start:
unlocked = grid_state->unlocked;
candidates = grid_state->candidates;
// Find naked singles
{
bool found;
const __m256i ones = _mm256_set1_epi16(1);
const __m256i bit_mask = _mm256_setr_epi16(1<<0, 1<<1, 1<<2, 1<<3, 1<<4, 1<<5, 1<<6, 1<<7, 1<<8, 1<<9, 1<<10, 1<<11, 1<<12, 1<<13, 1<<14, 1<<15);
do {
found = false;
for (unsigned char i = 0; i < 80; i += 16) {
__m256i c = _mm256_loadu_si256((__m256i*) &candidates[i]);
// Check if any cell has zero candidates
if (_mm256_movemask_epi8(_mm256_cmpeq_epi16(c, _mm256_setzero_si256()))) {
// Back track, no solutions along this path
grid_state = track_back(grid_state);
goto start;
} else {
unsigned short m;
if (i < 64) {
m = (unsigned short) ((unlocked[0] >> i) & 0xffff);
} else {
m = (unsigned short) (unlocked[1] & 0xffff);
}
__m256i a = _mm256_cmpeq_epi16(_mm256_and_si256(c, _mm256_sub_epi16(c, ones)), _mm256_setzero_si256());
__m256i u = _mm256_cmpeq_epi16(_mm256_and_si256(bit_mask, _mm256_set1_epi16(m)), bit_mask);
int mask = _mm256_movemask_epi8(_mm256_and_si256(a, u));
if (mask) {
unsigned long index, digit;
_BitScanForward(&index, mask);
index = (index >> 1) + i;
_BitScanReverse(&digit, candidates[index]);
enter_digit(grid_state, (signed char) digit, index);
found = true;
}
}
}
if (unlocked[1] & (1ULL << (80-64))) {
if (candidates[80] == 0) {
// no solutions go back
grid_state = track_back(grid_state);
goto start;
} else if (__popcnt16(candidates[80]) == 1) {
// Enter the digit and update candidates
unsigned long digit;
_BitScanReverse(&digit, candidates[80]);
enter_digit(grid_state, (signed char) digit, 80);
found = true;
}
}
} while (found);
}
// Check if it's solved, if it ever gets solved it will be solved after looking for naked singles
if ((unlocked[0] | unlocked[1]) == 0) {
// Solved it
// Free memory
while (grid_state) {
struct GridState* prev_grid_state = grid_state->prev;
free(grid_state);
grid_state = prev_grid_state;
}
// Enter found digits into grid
for (unsigned char j = 0; j < 81; ++j) {
unsigned long index;
_BitScanReverse(&index, candidates[j]);
grid[j] = (signed char) index + 49;
}
return true;
}
// Find hidden singles
// Don't check the last column because it doesn't fit in the SSE register so it's not really worth checking
{
const __m128i ones = _mm_set1_epi16(1);
const __m128i bit_mask = _mm_setr_epi16(1<<0, 1<<1, 1<<2, 1<<3, 1<<4, 1<<5, 1<<6, 1<<7);
const __m128i shuffle_mask = _mm_setr_epi8(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1);
for (unsigned char i = 0; i < 81; i += 9) {
// rows
__m128i row_mask = _mm_set1_epi16(0x01ff ^ candidates[i+8]);
__m128i c = _mm_loadu_si128((__m128i*) &candidates[i]);
for (unsigned char j = 0; j < 7; ++j) {
// rotate shift (1 2 3 4) -> (4 1 2 3)
c = _mm_shuffle_epi8(c, shuffle_mask);
row_mask = _mm_andnot_si128(c, row_mask);
}
// columns
__m128i column_mask = _mm_set1_epi16(0x01ff);
for (unsigned char j = 0; j < 81; j += 9) {
if (j != i) {
column_mask = _mm_andnot_si128(_mm_loadu_si128((__m128i*) &candidates[j]), column_mask);
}
}
// boxes aren't worth it
__m128i or_mask = _mm_or_si128(row_mask, column_mask);
if (_mm_test_all_zeros(or_mask, _mm_sub_epi16(or_mask, ones))) {
unsigned short m;
if (i < 64) {
m = (unsigned short) ((unlocked[0] >> i) & 0xff);
} else {
m = (unsigned short) ((unlocked[1] >> (i-64)) & 0xff);
}
c = _mm_loadu_si128((__m128i*) &candidates[i]);
__m128i a = _mm_cmpgt_epi16(_mm_and_si128(c, or_mask), _mm_setzero_si128());
__m128i u = _mm_cmpeq_epi16(_mm_and_si128(bit_mask, _mm_set1_epi16(m)), bit_mask);
int mask = _mm_movemask_epi8(_mm_and_si128(a, u));
if (mask) {
unsigned long index, digit;
_BitScanForward(&index, mask);
index = index/2;
int can = ((unsigned short*) &or_mask)[index];
_BitScanForward(&digit, can);
index = index + i;
enter_digit(grid_state, (signed char) digit, index);
goto start;
}
} else {
// no solutions go back
grid_state = track_back(grid_state);
goto start;
}
}
}
// Find naked sets, up to 5
{
bool found = false;
// because this is kind of an expensive task, we are not going to visit all cells but only those that were changed
unsigned long long *to_visit_n = grid_state->updated;
to_visit_n[0] &= unlocked[0];
to_visit_n[1] &= unlocked[1];
for (unsigned char n = 0; n < 2; ++n) {
while (to_visit_n[n]) {
unsigned long i_rel;
_BitScanForward64(&i_rel, to_visit_n[n]);
to_visit_n[n] ^= 1ULL << i_rel;
unsigned char i = (unsigned char) i_rel + 64*n;
unsigned short cnt = __popcnt16(candidates[i]);
if (cnt <= 5) {
// check column
unsigned long long to_change[2] = {0};
unsigned char s;
__m128i a_i = _mm_set1_epi16(candidates[i]);
__m128i a_j = _mm_set_epi16(candidates[column_index[i]+63], candidates[column_index[i]+54], candidates[column_index[i]+45], candidates[column_index[i]+36], candidates[column_index[i]+27], candidates[column_index[i]+18], candidates[column_index[i]+9], candidates[column_index[i]]);
__m128i res = _mm_cmpeq_epi16(a_i, _mm_or_si128(a_i, a_j));
s = __popcnt16(_mm_movemask_epi8(res)) >> 1;
s += candidates[i] == (candidates[i] | candidates[column_index[i]+72]);
if (s > cnt) {
grid_state = track_back(grid_state);
goto start;
} else if (s == cnt) {
add_column_indices(to_change, i);
}
// check row
a_j = _mm_load_si128((__m128i*) &candidates[9*row_index[i]]);
res = _mm_cmpeq_epi16(a_i, _mm_or_si128(a_i, a_j));
s = __popcnt16(_mm_movemask_epi8(res)) >> 1;
s += candidates[i] == (candidates[i] | candidates[9*row_index[i]+8]);
if (s > cnt) {
grid_state = track_back(grid_state);
goto start;
} else if (s == cnt) {
add_row_indices(to_change, i);
}
// check box
unsigned short b = box_start[i];
a_j = _mm_set_epi16(candidates[b], candidates[b+1], candidates[b+2], candidates[b+9], candidates[b+10], candidates[b+11], candidates[b+18], candidates[b+19]);
res = _mm_cmpeq_epi16(a_i, _mm_or_si128(a_i, a_j));
s = __popcnt16(_mm_movemask_epi8(res)) >> 1;
s += candidates[i] == (candidates[i] | candidates[b+20]);
if (s > cnt) {
grid_state = track_back(grid_state);
goto start;
} else if (s == cnt) {
add_box_indices(to_change, i);
}
to_change[0] &= unlocked[0];
to_change[1] &= unlocked[1];
// update candidates
for (unsigned char n = 0; n < 2; ++n) {
while (to_change[n]) {
unsigned long j_rel;
_BitScanForward64(&j_rel, to_change[n]);
to_change[n] &= ~(1ULL << j_rel);
unsigned char j = (unsigned char) j_rel + 64*n;
if ((candidates[j] | candidates[i]) != candidates[i]) {
if (candidates[j] & candidates[i]) {
candidates[j] &= ~candidates[i];
to_visit_n[n] |= 1ULL << j_rel;
found = true;
}
}
}
}
// If any cell's candidates got updated, go back and try all that other stuff again
if (found) {
goto start;
}
}
}
}
}
// More techniques could be added here but they're not really worth checking for on the 17 clue sudoku set
// Make a guess if all that didn't work
grid_state = make_guess(grid_state);
goto start;
}
int main(int argc, char *argv[]) {
if (argc != 2) {
printf("Error! Pass the file name as argument\n");
return -1;
}
FILE *f = fopen(argv[1], "rb");
// test for files not existing
if (f == 0) {
printf("Error! Could not open file %s\n", argv[1]);
return -1;
}
// find length of file
fseek(f, 0, SEEK_END);
long fsize = ftell(f);
fseek(f, 0, SEEK_SET);
// deal with the part that says how many sudokus there are
signed char *string = malloc(fsize + 1);
fread(string, 1, fsize, f);
fclose(f);
size_t p = 0;
while (string[p] != 10) {
++p;
}
++p;
signed char *output = malloc(fsize*2 - p + 2);
memcpy(output, string, p);
// solve all sudokus and prepare output file
int i;
#pragma omp parallel for shared(string, output, fsize, p) schedule(dynamic)
for (i = fsize - p - 81; i >= 0; i-=82) {
// copy unsolved grid
memcpy(&output[p+i*2], &string[p+i], 81);
memcpy(&output[p+i*2+82], &string[p+i], 81);
// add comma and newline in right place
output[p+i*2 + 81] = ',';
output[p+i*2 + 163] = 10;
// solve the grid in place
solve(&output[p+i*2+82]);
}
// create output file
f = fopen("output.txt", "wb");
fwrite(output, 1, fsize*2 - p + 2, f);
fclose(f);
free(string);
free(output);
return 0;
}
The algorithm uses normal techniques humans use. Specifically looking for naked singles, hidden singles and naked subsets. If that doesn't work it uses the infamous Nishio technique (just backtracking). I added more techniques initially like looking for locked candidates, but they turned out to not really be worth it for these 17-clue sudokus.
-
\$\begingroup\$ very cool!! I recently did some benchmarks and it loooks like
std::copyusually is a fraction faster thanmemcpy\$\endgroup\$jdt– jdt2021年09月22日 22:27:22 +00:00Commented Sep 22, 2021 at 22:27 -
\$\begingroup\$ @JohanduToit The memcpy part is not really critical since the algorithm makes on average only 1 guess per sudoku (it doesn't get called often). When looking for locked candidates this goes down to only 0.3 or 0.2 guesses but my method for finding them was kinda bad so it didn't make much of a difference in time. I think if you want to make it significantly faster that would be something to look into though. \$\endgroup\$Mirage– Mirage2021年09月22日 23:02:24 +00:00Commented Sep 22, 2021 at 23:02
-
1\$\begingroup\$ Wow, this is really impressive! I noticed that you're using
omp.hto solve all of the puzzles in parallel. It was a long time since I hosted this challenge, so I can't remember the exact verdict on parallelization, but your solution would be either the fastest or second fastest running single-threaded. I'll try to get this benchmarked to get you an official score! \$\endgroup\$maxb– maxb2021年09月23日 15:05:06 +00:00Commented Sep 23, 2021 at 15:05 -
\$\begingroup\$ Very nice Mirage! I ran this through tdoku's benchmarks, and it is actually the fastest solver I've seen whose primary representation is cell candidate sets. It's right in there with other top solvers in single-threaded performance, and it does the least guessing of any solver except tdoku. Here is the comparison on various data sets: pastebin.com/ysXKWXJQ I haven't benchmarked with openmp since all of these solvers will benefit in roughly the same way. Is this on github somewhere? It is an interesting specimen ... may I add it to tdoku's "other solver" benchmarks? \$\endgroup\$53x15– 53x152021年10月02日 19:20:32 +00:00Commented Oct 2, 2021 at 19:20
-
\$\begingroup\$ Sorry, looks like I didn't set up the comparison correctly in the benchmarks above. Your solver returns when it finds the first solution and doesn't confirm that the solution is unique (something all the other solvers were doing). When you run them all in "first solution" mode then the results are here: pastebin.com/JTmkp8Y7 Still a solid performer, and maybe with headroom, but not the outlier it seemed to be in either speed and guesses. Instead it's comparable to other fast solvers with a cell-based representation. A little faster on some datasets, a little slower on others. \$\endgroup\$53x15– 53x152021年10月02日 23:11:14 +00:00Commented Oct 2, 2021 at 23:11
Node.js, (削除) 8.231s (削除ここまで) 6.735s official score
Takes the file name as argument. The input file may already contain the solutions in the format described in the challenge, in which case the program will compare them with its own solutions.
The results are saved in 'sudoku.log'.
Code
'use strict';
const fs = require('fs');
const BLOCK = [];
const BLOCK_NDX = [];
const N_BIT = [];
const ZERO = [];
const BIT = [];
console.time('Processing time');
init();
let filename = process.argv[2],
puzzle = fs.readFileSync(filename).toString().split('\n'),
len = puzzle.shift(),
output = len + '\n';
console.log("File '" + filename + "': " + len + " puzzles");
// solve all puzzles
puzzle.forEach((p, i) => {
let sol, res;
[ p, sol ] = p.split(',');
if(p.length == 81) {
if(!(++i % 2000)) {
console.log((i * 100 / len).toFixed(1) + '%');
}
if(!(res = solve(p))) {
throw "Failed on puzzle " + i;
}
if(sol && res != sol) {
throw "Invalid solution for puzzle " + i;
}
output += p + ',' + res + '\n';
}
});
// results
console.timeEnd('Processing time');
fs.writeFileSync('sudoku.log', output);
console.log("MD5 = " + require('crypto').createHash('md5').update(output).digest("hex"));
// initialization of lookup tables
function init() {
let ptr, x, y;
for(x = 0; x < 0x200; x++) {
N_BIT[x] = [0, 1, 2, 3, 4, 5, 6, 7, 8].reduce((s, n) => s + (x >> n & 1), 0);
ZERO[x] = ~x & -~x;
}
for(x = 0; x < 9; x++) {
BIT[1 << x] = x;
}
for(ptr = y = 0; y < 9; y++) {
for(x = 0; x < 9; x++, ptr++) {
BLOCK[ptr] = (y / 3 | 0) * 3 + (x / 3 | 0);
BLOCK_NDX[ptr] = (y % 3) * 3 + x % 3;
}
}
}
// solver
function solve(p) {
let ptr, x, y, v,
count = 81,
m = Array(81).fill(-1),
row = Array(9).fill(0),
col = Array(9).fill(0),
blk = Array(9).fill(0);
// helper function to check and play a move
function play(stack, x, y, n) {
let p = y * 9 + x;
if(~m[p]) {
if(m[p] == n) {
return true;
}
undo(stack);
return false;
}
let msk, b;
msk = 1 << n;
b = BLOCK[p];
if((col[x] | row[y] | blk[b]) & msk) {
undo(stack);
return false;
}
count--;
col[x] ^= msk;
row[y] ^= msk;
blk[b] ^= msk;
m[p] = n;
stack.push(x << 8 | y << 4 | n);
return true;
}
// helper function to undo all moves on the stack
function undo(stack) {
stack.forEach(v => {
let x = v >> 8,
y = v >> 4 & 15,
p = y * 9 + x,
b = BLOCK[p];
v = 1 << (v & 15);
count++;
col[x] ^= v;
row[y] ^= v;
blk[b] ^= v;
m[p] = -1;
});
}
// convert the puzzle into our own format
for(ptr = y = 0; y < 9; y++) {
for(x = 0; x < 9; x++, ptr++) {
if(~(v = p[ptr] - 1)) {
col[x] |= 1 << v;
row[y] |= 1 << v;
blk[BLOCK[ptr]] |= 1 << v;
count--;
m[ptr] = v;
}
}
}
// main recursive search function
let res = (function search() {
// success?
if(!count) {
return true;
}
let ptr, x, y, v, n, max, best,
k, i, stack = [],
dCol = Array(81).fill(0),
dRow = Array(81).fill(0),
dBlk = Array(81).fill(0),
b, v0;
// scan the grid:
// - keeping track of where each digit can go on a given column, row or block
// - looking for a cell with the fewest number of legal moves
for(max = ptr = y = 0; y < 9; y++) {
for(x = 0; x < 9; x++, ptr++) {
if(m[ptr] == -1) {
v = col[x] | row[y] | blk[BLOCK[ptr]];
n = N_BIT[v];
// abort if there's no legal move on this cell
if(n == 9) {
return false;
}
// update dCol[], dRow[] and dBlk[]
for(v0 = v ^ 0x1FF; v0;) {
b = v0 & -v0;
dCol[x * 9 + BIT[b]] |= 1 << y;
dRow[y * 9 + BIT[b]] |= 1 << x;
dBlk[BLOCK[ptr] * 9 + BIT[b]] |= 1 << BLOCK_NDX[ptr];
v0 ^= b;
}
// update the cell with the fewest number of moves
if(n > max) {
best = {
x : x,
y : y,
ptr: ptr,
msk: v
};
max = n;
}
}
}
}
// play all forced moves (unique candidates on a given column, row or block)
// and make sure that it doesn't lead to any inconsistency
for(k = 0; k < 9; k++) {
for(n = 0; n < 9; n++) {
if(N_BIT[dCol[k * 9 + n]] == 1) {
i = BIT[dCol[k * 9 + n]];
if(!play(stack, k, i, n)) {
return false;
}
}
if(N_BIT[dRow[k * 9 + n]] == 1) {
i = BIT[dRow[k * 9 + n]];
if(!play(stack, i, k, n)) {
return false;
}
}
if(N_BIT[dBlk[k * 9 + n]] == 1) {
i = BIT[dBlk[k * 9 + n]];
if(!play(stack, (k % 3) * 3 + i % 3, (k / 3 | 0) * 3 + (i / 3 | 0), n)) {
return false;
}
}
}
}
// if we've played at least one forced move, do a recursive call right away
if(stack.length) {
if(search()) {
return true;
}
undo(stack);
return false;
}
// otherwise, try all moves on the cell with the fewest number of moves
while((v = ZERO[best.msk]) < 0x200) {
col[best.x] ^= v;
row[best.y] ^= v;
blk[BLOCK[best.ptr]] ^= v;
m[best.ptr] = BIT[v];
count--;
if(search()) {
return true;
}
count++;
m[best.ptr] = -1;
col[best.x] ^= v;
row[best.y] ^= v;
blk[BLOCK[best.ptr]] ^= v;
best.msk ^= v;
}
return false;
})();
return res ? m.map(n => n + 1).join('') : false;
}
// debugging
function dump(m) {
let x, y, c = 81, s = '';
for(y = 0; y < 9; y++) {
for(x = 0; x < 9; x++) {
s += (~m[y * 9 + x] ? (c--, m[y * 9 + x] + 1) : '-') + (x % 3 < 2 || x == 8 ? ' ' : ' | ');
}
s += y % 3 < 2 || y == 8 ? '\n' : '\n------+-------+------\n';
}
console.log(c);
console.log(s);
}
Example output
Tested on an Intel Core i7 7500U @ 2.70 GHz.
-
1\$\begingroup\$ @maxb I see. I did not understand that your concern was about multi-threading. \$\endgroup\$Arnauld– Arnauld2019年08月24日 21:22:44 +00:00Commented Aug 24, 2019 at 21:22
-
1\$\begingroup\$ it should be faster if you delete the line
if(!stack.some(... and its corresponding}(but preserve what's between them), and at the beginning ofplay()insertif(m[y][x]>=0)return v==1<<m[y][x]\$\endgroup\$ngn– ngn2019年08月28日 15:44:37 +00:00Commented Aug 28, 2019 at 15:44 -
1\$\begingroup\$ Why is your solution so much faster than the other ones? \$\endgroup\$user9207– user92072019年08月28日 18:54:00 +00:00Commented Aug 28, 2019 at 18:54
-
2\$\begingroup\$ @Anush What I've called 'forced moves' in the code is what makes it significantly faster and is better known as hidden singles. (We could also look for hidden twins, triples, quads, etc. but I'm not sure it's really worth it, at least in Node.) \$\endgroup\$Arnauld– Arnauld2019年08月28日 19:58:34 +00:00Commented Aug 28, 2019 at 19:58
-
6\$\begingroup\$ "I started looking at naked singles" careful with the phrasing :) \$\endgroup\$ngn– ngn2019年08月29日 08:20:44 +00:00Commented Aug 29, 2019 at 8:20
Python 3 (with dlx) 4min 46.870s official score
(single core i7-3610QM here)
Obviously beatable with a compiled language like C, and making use of threading, but it's a start...
sudoku is a module I've placed on github (copied at the footer of this post) which uses dlx under the hood.
#!/usr/bin/python
import argparse
import gc
import sys
from timeit import timeit
from sudoku import Solver
def getSolvers(filePath):
solvers = []
with open(filePath, 'r') as inFile:
for line in inFile:
content = line.rstrip()
if len(content) == 81 and content.isdigit():
solvers.append(Solver(content))
return solvers
def solve(solvers):
for solver in solvers:
yield next(solver.genSolutions())
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Time or print solving of some sudoku.')
parser.add_argument('filePath',
help='Path to the file containing proper sudoku on their own lines as 81 digits in row-major order with 0s as blanks')
parser.add_argument('-p', '--print', dest='printEm', action='store_true',
default=False,
help='print solutions in the same fashion as the input')
parser.add_argument('-P', '--pretty', dest='prettyPrintEm', action='store_true',
default=False,
help='print inputs and solutions formatted for human consumption')
args = parser.parse_args()
if args.printEm or args.prettyPrintEm:
solvers = getSolvers(args.filePath)
print(len(solvers))
for solver, solution in zip(solvers, solve(solvers)):
if args.prettyPrintEm:
print(solver)
print(solution)
else:
print('{},{}'.format(solver.representation(noneCharacter='0'), solution.representation()))
else:
setup = '''\
from __main__ import getSolvers, solve, args, gc
gc.disable()
solvers = getSolvers(args.filePath)'''
print(timeit("for solution in solve(solvers): pass", setup=setup, number=1))
Usage
- Install Python 3
- Save
sudoku.pysomewhere on your path (from the git hub link or copy it from below) - Save the above code as
testSolver.pysomewhere on your path - Install dlx:
python -m pip install dlx
- Run it (by the way it consumes memory like it's going out of fashion)
usage: testSolver.py [-h] [-p] [-P] filePath Time or print solving of some sudoku. positional arguments: filePath Path to the file containing proper sudoku on their own lines as 81 digits in row-major order with 0s as blanks optional arguments: -h, --help show this help message and exit -p, --print print solutions in the same fashion as the input -P, --pretty print inputs and solutions formatted for human consumption
Pipe output as required in the challenge spec to a file if need be:
python testSolver.py -p input_file_path> output_file_path
sudoku.py (yes there are extra features here other than solving)
import dlx
from itertools import permutations, takewhile
from random import choice, shuffle
'''
A 9 by 9 sudoku solver.
'''
_N = 3
_NSQ = _N**2
_NQU = _N**4
_VALID_VALUE_INTS = list(range(1, _NSQ + 1))
_VALID_VALUE_STRS = [str(v) for v in _VALID_VALUE_INTS]
_EMPTY_CELL_CHAR = '·'
# The following are mutually related by their ordering, and define ordering throughout the rest of the code. Here be dragons.
#
_CANDIDATES = [(r, c, v) for r in range(_NSQ) for c in range(_NSQ) for v in range(1, _NSQ + 1)]
_CONSTRAINT_INDEXES_FROM_CANDIDATE = lambda r, c, v: [ _NSQ * r + c, _NQU + _NSQ * r + v - 1, _NQU * 2 + _NSQ * c + v - 1, _NQU * 3 + _NSQ * (_N * (r // _N) + c // _N) + v - 1]
_CONSTRAINT_FORMATTERS = [ "R{0}C{1}" , "R{0}#{1}" , "C{0}#{1}" , "B{0}#{1}"]
_CONSTRAINT_NAMES = [(s.format(a, b + (e and 1)), dlx.DLX.PRIMARY) for e, s in enumerate(_CONSTRAINT_FORMATTERS) for a in range(_NSQ) for b in range(_NSQ)]
_EMPTY_GRID_CONSTRAINT_INDEXES = [_CONSTRAINT_INDEXES_FROM_CANDIDATE(r, c, v) for (r, c, v) in _CANDIDATES]
#
# The above are mutually related by their ordering, and define ordering throughout the rest of the code. Here be dragons.
class Solver:
def __init__(self, representation=''):
if not representation or len(representation) != _NQU:
self._complete = False
self._NClues = 0
self._repr = [None]*_NQU # blank grid, no clues - maybe to extend to a generator by overriding the DLX column selection to be stochastic.
else:
nClues = 0
repr = []
for value in representation:
if not value:
repr.append(None)
elif isinstance(value, int) and 1 <= value <= _NSQ:
nClues += 1
repr.append(value)
elif value in _VALID_VALUE_STRS:
nClues += 1
repr.append(int(value))
else:
repr.append(None)
self._complete = nClues == _NQU
self._NClues = nClues
self._repr = repr
def genSolutions(self, genSudoku=True, genNone=False, dlxColumnSelctor=None):
'''
if genSudoku=False, generates each solution as a list of cell values (left-right, top-bottom)
'''
if self._complete:
yield self
else:
self._initDlx()
dlxColumnSelctor = dlxColumnSelctor or dlx.DLX.smallestColumnSelector
if genSudoku:
for solution in self._dlx.solve(dlxColumnSelctor):
yield Solver([v for (r, c, v) in sorted([self._dlx.N[i] for i in solution])])
elif genNone:
for solution in self._dlx.solve(dlxColumnSelctor):
yield
else:
for solution in self._dlx.solve(dlxColumnSelctor):
yield [v for (r, c, v) in sorted([self._dlx.N[i] for i in solution])]
def uniqueness(self, returnSolutionIfProper=False):
'''
Returns: 0 if unsolvable;
1 (or the unique solution if returnSolutionIfProper=True) if uniquely solvable; or
2 if multiple possible solutions exist
- a 'proper' sudoku is uniquely solvable.
'''
slns = list(takewhile(lambda t: t[0] < 2, ((i, sln) for i, sln in enumerate(self.genSolutions(genSudoku=returnSolutionIfProper, genNone=not returnSolutionIfProper)))))
uniqueness = len(slns)
if returnSolutionIfProper and uniqueness == 1:
return slns[0][1]
else:
return uniqueness
def representation(self, asString=True, noneCharacter='.'):
if asString:
return ''.join([v and str(_VALID_VALUE_STRS[v - 1]) or noneCharacter for v in self._repr])
return self._repr[:]
def __repr__(self):
return display(self._repr)
def _initDlx(self):
self._dlx = dlx.DLX(_CONSTRAINT_NAMES)
rowIndexes = self._dlx.appendRows(_EMPTY_GRID_CONSTRAINT_INDEXES, _CANDIDATES)
for r in range(_NSQ):
for c in range(_NSQ):
v = self._repr[_NSQ * r + c]
if v is not None:
self._dlx.useRow(rowIndexes[_NQU * r + _NSQ * c + v - 1])
_ROW_SEPARATOR_COMPACT = '+'.join(['-' * (2 * _N + 1) for b in range(_N)])[1:-1] + '\n'
_ROW_SEPARATOR = ' ·-' + _ROW_SEPARATOR_COMPACT[:-1] + '-·\n'
_TOP_AND_BOTTOM = _ROW_SEPARATOR.replace('+', '·')
_ROW_LABELS = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J']
_COL_LABELS = ['1', '2', '3', '4', '5', '6', '7', '8', '9']
_COLS_LABEL = ' ' + ' '.join([i % _N == 0 and ' ' + l or l for i, l in enumerate(_COL_LABELS)]) + '\n'
def display(representation, conversion=None, labelled=True):
result = ''
raw = [conversion[n or 0] for n in representation] if conversion else representation
if labelled:
result += _COLS_LABEL + _TOP_AND_BOTTOM
rSep = _ROW_SEPARATOR
else:
rSep = _ROW_SEPARATOR_COMPACT
for r in range(_NSQ):
if r > 0 and r % _N == 0:
result += rSep
for c in range(_NSQ):
if c % _N == 0:
if c == 0:
if labelled:
result += _ROW_LABELS[r] + '| '
else:
result += '| '
result += str(raw[_NSQ * r + c] or _EMPTY_CELL_CHAR) + ' '
if labelled:
result += '|'
result += '\n'
if labelled:
result += _TOP_AND_BOTTOM
else:
result = result[:-1]
return result
def permute(representation):
'''
returns a random representation from the given representation's equivalence class
'''
rows = [list(representation[i:i+_NSQ]) for i in range(0, _NQU, _NSQ)]
rows = permuteRowsAndBands(rows)
rows = [[r[i] for r in rows] for i in range(_NSQ)]
rows = permuteRowsAndBands(rows)
pNumbers = [str(i) for i in range(1, _NSQ + 1)]
shuffle(pNumbers)
return ''.join(''.join([pNumbers[int(v) - 1] if v.isdigit() and v != '0' else v for v in r]) for r in rows)
def permuteRowsAndBands(rows):
bandP = choice([x for x in permutations(range(_N))])
rows = [rows[_N * b + r] for b in bandP for r in range(_N)]
for band in range(0, _NSQ, _N):
rowP = choice([x for x in permutations([band + i for i in range(_N)])])
rows = [rows[rowP[i % _N]] if i // _N == band else rows[i] for i in range(_NSQ)]
return rows
def getRandomSolvedStateRepresentation():
return permute('126459783453786129789123456897231564231564897564897231312645978645978312978312645')
def getRandomSudoku():
r = getRandomSolvedStateRepresentation()
s = Solver(r)
indices = list(range(len(r)))
shuffle(indices)
for i in indices:
ns = Solver(s._repr[:i] + [None] + s._repr[i+1:])
if ns.uniqueness() == 1:
s = ns
return s
if __name__ == '__main__':
print('Some example useage:')
inputRepresentation = '..3......4......2..8.12...6.........2...6...7...8.7.31.1.64.9..6.5..8...9.83...4.'
print('>>> s = Solver({})'.format(inputRepresentation))
s = Solver(inputRepresentation)
print('>>> s')
print(s)
print('>>> print(s.representation())')
print(s.representation())
print('>>> print(display(s.representation(), labelled=False))')
print(display(s.representation(), labelled=False))
print('>>> for solution in s.genSolutions(): solution')
for solution in s.genSolutions(): print(solution)
inputRepresentation2 = inputRepresentation[:2] + '.' + inputRepresentation[3:]
print('>>> s.uniqueness()')
print(s.uniqueness())
print('>>> s2 = Solver({}) # removed a clue; this has six solutions rather than one'.format(inputRepresentation2))
s2 = Solver(inputRepresentation2)
print('>>> s2.uniqueness()')
print(s2.uniqueness())
print('>>> for solution in s2.genSolutions(): solution')
for solution in s2.genSolutions(): print(solution)
print('>>> s3 = getRandomSudoku()')
s3 = getRandomSudoku()
print('>>> s3')
print(s3)
print('>>> for solution in s3.genSolutions(): solution')
for solution in s3.genSolutions(): print(solution)
-
\$\begingroup\$ Impressive for a Python solution, I'll try to benchmark it later today. \$\endgroup\$maxb– maxb2019年08月24日 08:31:35 +00:00Commented Aug 24, 2019 at 8:31
-
1\$\begingroup\$ Thanks; and wow, so much faster there maxb! \$\endgroup\$Jonathan Allan– Jonathan Allan2019年08月24日 13:42:42 +00:00Commented Aug 24, 2019 at 13:42
-
1\$\begingroup\$ +1 for using dancing links \$\endgroup\$user9207– user92072019年08月25日 04:32:33 +00:00Commented Aug 25, 2019 at 4:32
Python 3 + Z3 - 10min 45.657s official score
about 1000s on my laptop.
import time
start = time.time()
import z3.z3 as z3
import itertools
import datetime
import sys
solver = z3.Solver()
ceils = [[None] * 9 for i in range(9)]
for row in range(9):
for col in range(9):
name = 'c' + str(row * 9 + col)
ceil = z3.BitVec(name, 9)
solver.add(z3.Or(
ceil == 0b000000001,
ceil == 0b000000010,
ceil == 0b000000100,
ceil == 0b000001000,
ceil == 0b000010000,
ceil == 0b000100000,
ceil == 0b001000000,
ceil == 0b010000000,
ceil == 0b100000000
))
solver.add(ceil != 0)
ceils[row][col] = ceil
for i in range(9):
for j in range(9):
for k in range(9):
if j == k: continue
solver.add(ceils[i][j] & ceils[i][k] == 0)
solver.add(ceils[j][i] & ceils[k][i] == 0)
row, col = i // 3 * 3, i % 3 * 3
solver.add(ceils[row + j // 3][col + j % 3] & ceils[row + k // 3][col + k % 3] == 0)
row_col = list(itertools.product(range(9), range(9)))
lookup = { 1 << i: str(i + 1) for i in range(9) }
def solve(line):
global solver, output, row_col, ceils, lookup
solver.push()
for value, (row, col) in zip(line, row_col):
val = ord(value) - 48
if val == 0: continue
solver.add(ceils[row][col] == 1 << (val - 1))
output = []
if solver.check() == z3.sat:
model = solver.model()
for row in range(9):
for col in range(9):
val = model[ceils[row][col]].as_long()
output.append(lookup[val])
solver.pop()
return ''.join(output)
count = int(input())
print(count)
for i in range(count):
if i % 1000 == 0:
sys.stderr.write(str(i) + '\n')
line = input()
print(line + "," + solve(line))
end = time.time()
sys.stderr.write(str(end - start))
Install dependency
pip install z3-solver
Run
python3 solve.py < in.txt> out.txt
I'm not sure how to improve its performance, since it just solved magically...
-
\$\begingroup\$ Quite impressive for a general constraint solver. My first implementation was way slower than this. Running a benchmark right now, I'll update the post once it's done. \$\endgroup\$maxb– maxb2019年08月26日 05:53:24 +00:00Commented Aug 26, 2019 at 5:53
-
\$\begingroup\$ @maxb just done some general clean up, and i believe there is no need to update the benchmark... \$\endgroup\$tsh– tsh2019年08月26日 07:03:51 +00:00Commented Aug 26, 2019 at 7:03
C - (削除) 2.228s (削除ここまで) 1.690s official score
based on @Arnauld's
#include<fcntl.h>
#define O const
#define R return
#define S static
#define $(x,y...)if(x){y;}
#define W(x,y...)while(x){y;}
#define fi(x,y...)for(I i=0,_n=(x);i<_n;i++){y;}
#define fj(x,y...)for(I j=0,_n=(x);j<_n;j++){y;}
#define fp81(x...)for(I p=0;p<81;p++){x;}
#define fq3(x...)for(I q=0;q<3;q++){x;}
#define fij9(x...){fi(9,fj(9,x))}
#define m0(x)m0_((V*)(x),sizeof(x));
#define popc(x)__builtin_popcount(x)
#define ctz(x)__builtin_ctz(x)
#include<sys/syscall.h>
#define sc(f,x...)({L u;asm volatile("syscall":"=a"(u):"0"(SYS_##f)x:"cc","rcx","r11","memory");u;})
#define sc1(f,x) sc(f,,"D"(x))
#define sc2(f,x,y) sc(f,,"D"(x),"S"(y))
#define sc3(f,x,y,z)sc(f,,"D"(x),"S"(y),"d"(z))
#define wr(a...)sc3(write,a)
#define op(a...)sc3( open,a)
#define cl(a...)sc1(close,a)
#define fs(a...)sc2(fstat,a)
#define ex(a...)sc1( exit,a)
#define mm(x,y,z,t,u,v)({register L r10 asm("r10")=t,r8 asm("r8")=u,r9 asm("r9")=v;\
(V*)sc(mmap,,"D"(x),"S"(y),"d"(z),"r"(r10),"r"(r8),"r"(r9));})
typedef void V;typedef char C;typedef short H;typedef int I;typedef long long L;
S C BL[81],KL[81],IJK[81][3],m[81],t_[81-17],*t;S H rcb[3][9],cnt;
S V*mc(V*x,O V*y,L n){C*p=x;O C*q=y;fi(n,*p++=*q++)R x;}S V m0_(C*p,L n){fi(n,*p++=0);}
S I undo(C*t0){cnt+=t-t0;W(t>t0,C p=*--t;H v=1<<m[p];fq3(rcb[q][IJK[p][q]]^=v)m[p]=-1)R 0;}
S I play(C p,H v){$(m[p]>=0,R 1<<m[p]==v)I w=0;fq3(w|=rcb[q][IJK[p][q]])$(w&v,R 0)cnt--;
fq3(rcb[q][IJK[p][q]]^=v);m[p]=ctz(v);*t++=p;R 1;}
S I f(){$(!cnt,R 1)C*t0=t;H max=0,bp,bv,d[9][9][4];m0(d);
fij9(I p=i*9+j;$(m[p]<0,
I v=0;fq3(v|=rcb[q][IJK[p][q]])I w=v^511;$(!w,R 0)H g[]={1<<j,1<<i,1<<BL[p]};
do{I z=ctz(w);w&=w-1;fq3(d[IJK[p][q]][z][q]|=g[q]);}while(w);
I n=popc(v);$(max<n,max=n;bp=p;bv=v)))
fij9(I u=d[i][j][0];$(popc(u)==1,I l=ctz(u);$(!play( i*9+l ,1<<j),R undo(t0)))
u=d[i][j][1];$(popc(u)==1,I l=ctz(u);$(!play( l*9+i ,1<<j),R undo(t0)))
u=d[i][j][2];$(popc(u)==1,I l=ctz(u);$(!play(KL[i*9+l],1<<j),R undo(t0))))
$(t-t0,R f()||undo(t0))
W(1,I v=1<<ctz(~bv);$(v>511,R 0)fq3(rcb[q][IJK[bp][q]]^=v)m[bp]=ctz(v);cnt--;$(f(),R 1)
cnt++;m[bp]=-1;fq3(rcb[q][IJK[bp][q]]^=v)bv^=v)
R 0;}
asm(".globl _start\n_start:pop %rdi\nmov %rsp,%rsi\njmp main");
V main(I ac,C**av){$(ac!=2,ex(2))
fij9(I p=i*9+j;BL[p]=i%3*3+j%3;KL[p]=(i/3*3+j/3)*9+BL[p];IJK[p][0]=i;IJK[p][1]=j;IJK[p][2]=i/3*3+j/3)
I d=op(av[1],0,0);struct stat h;fs(d,&h);C*s0=mm(0,h.st_size,1,0x8002,d,0),*s=s0;cl(d); //in
C*r0=mm(0,2*h.st_size,3,0x22,-1,0),*r=r0; //out
I n=0;W(*s!='\n',n*=10;n+=*s++-'0')s++;mc(r,s0,s-s0);r+=s-s0;
fi(n,m0(rcb);cnt=81;t=t_;$(s[81]&&s[81]!='\n',ex(3))mc(r,s,81);r+=81;*r++=',';
fp81(I v=m[p]=*s++-'1';$(v>=0,v=1<<v;fq3(rcb[q][IJK[p][q]]|=v)cnt--))
s++;$(!f(),ex(4))fp81(r[p]=m[p]+'1')r+=81;*r++='\n')
wr(1,r0,r-r0);ex(0);}
compile and run:
gcc -O3 -march=native -nostdlib -ffreestanding
time ./a.out all_17_clue_sudokus.txt | md5sum
-
\$\begingroup\$ Congratulations, you (and Arnauld) are in the lead by a lot right now. \$\endgroup\$maxb– maxb2019年08月26日 12:32:21 +00:00Commented Aug 26, 2019 at 12:32
-
\$\begingroup\$ @maxb i tried using more efficient i/o (direct syscalls without libc) but the effect wasn't as great as i hoped. i also tidied up the rest of the code. this should take away ~0.2s. do you mind re-scoring? \$\endgroup\$ngn– ngn2019年08月27日 20:36:23 +00:00Commented Aug 27, 2019 at 20:36
-
\$\begingroup\$ Of course, I'll try to get it done sometime today \$\endgroup\$maxb– maxb2019年08月28日 05:06:04 +00:00Commented Aug 28, 2019 at 5:06
-
\$\begingroup\$ I was also thinking about trying a RAMdisk for all I/O, just to see if that makes a difference. I doubt it'll make a huge difference, since reads and writes are sequential, and my SSD has a large enough cache to fit everything. \$\endgroup\$maxb– maxb2019年08月28日 07:24:39 +00:00Commented Aug 28, 2019 at 7:24
-
\$\begingroup\$ @maxb there probably won't be any difference at all. the second time you run the program, the input file will already be in ram anyway - in linux's filesystem cache. \$\endgroup\$ngn– ngn2019年08月28日 07:44:41 +00:00Commented Aug 28, 2019 at 7:44
C++ with Minisat(2.2.1-5) - 11.735s official score
This is nowhere near as fast as a specialized algorithm, but it's a different approach, an interesting point of reference, and easy to understand.
$ clang++ -o solve -lminisat solver_minisat.cc
#include <minisat/core/Solver.h>
namespace {
using Minisat::Lit;
using Minisat::mkLit;
using namespace std;
struct SolverMiniSat {
Minisat::Solver solver;
SolverMiniSat() {
InitializeVariables();
InitializeTriadDefinitions();
InitializeTriadOnnes();
InitializeCellOnnes();
}
// normal cell literals, of which we have 9*9*9
static Lit Literal(int row, int column, int value) {
return mkLit(value + 9 * (column + 9 * row), true);
}
// horizontal triad literals, of which we have 9*3*9, starting after the cell literals
static Lit HTriadLiteral(int row, int column, int value) {
int base = 81 * 9;
return mkLit(base + value + 9 * (column + 3 * row));
}
// vertical triad literals, of which we have 3*9*9, starting after the h_triad literals
static Lit VTriadLiteral(int row, int column, int value) {
int base = (81 + 27) * 9;
return mkLit(base + value + 9 * (row + 3 * column));
}
void InitializeVariables() {
for (int i = 0; i < 15 * 9 * 9; i++) {
solver.newVar();
}
}
// create an exactly-one constraint over a set of literals
void CreateOnne(const Minisat::vec<Minisat::Lit> &literals) {
solver.addClause(literals);
for (int i = 0; i < literals.size() - 1; i++) {
for (int j = i + 1; j < literals.size(); j++) {
solver.addClause(~literals[i], ~literals[j]);
}
}
}
void InitializeTriadDefinitions() {
for (int i = 0; i < 9; i++) {
for (int j = 0; j < 3; j++) {
for (int value = 0; value < 9; value++) {
Lit h_triad = HTriadLiteral(i, j, value);
Lit v_triad = VTriadLiteral(j, i, value);
int j0 = j * 3 + 0, j1 = j * 3 + 1, j2 = j * 3 + 2;
Minisat::vec<Minisat::Lit> h_triad_def;
h_triad_def.push(Literal(i, j0, value));
h_triad_def.push(Literal(i, j1, value));
h_triad_def.push(Literal(i, j2, value));
h_triad_def.push(~h_triad);
CreateOnne(h_triad_def);
Minisat::vec<Minisat::Lit> v_triad_def;
v_triad_def.push(Literal(j0, i, value));
v_triad_def.push(Literal(j1, i, value));
v_triad_def.push(Literal(j2, i, value));
v_triad_def.push(~v_triad);
CreateOnne(v_triad_def);
}
}
}
}
void InitializeTriadOnnes() {
for (int i = 0; i < 9; i++) {
for (int value = 0; value < 9; value++) {
Minisat::vec<Minisat::Lit> row;
row.push(HTriadLiteral(i, 0, value));
row.push(HTriadLiteral(i, 1, value));
row.push(HTriadLiteral(i, 2, value));
CreateOnne(row);
Minisat::vec<Minisat::Lit> column;
column.push(VTriadLiteral(0, i, value));
column.push(VTriadLiteral(1, i, value));
column.push(VTriadLiteral(2, i, value));
CreateOnne(column);
Minisat::vec<Minisat::Lit> hbox;
hbox.push(HTriadLiteral(3 * (i / 3) + 0, i % 3, value));
hbox.push(HTriadLiteral(3 * (i / 3) + 1, i % 3, value));
hbox.push(HTriadLiteral(3 * (i / 3) + 2, i % 3, value));
CreateOnne(hbox);
Minisat::vec<Minisat::Lit> vbox;
vbox.push(VTriadLiteral(i % 3, 3 * (i / 3) + 0, value));
vbox.push(VTriadLiteral(i % 3, 3 * (i / 3) + 1, value));
vbox.push(VTriadLiteral(i % 3, 3 * (i / 3) + 2, value));
CreateOnne(vbox);
}
}
}
void InitializeCellOnnes() {
for (int row = 0; row < 9; row++) {
for (int column = 0; column < 9; column++) {
Minisat::vec<Minisat::Lit> literals;
for (int value = 0; value < 9; value++) {
literals.push(Literal(row, column, value));
}
CreateOnne(literals);
}
}
}
bool SolveSudoku(const char *input, char *solution, size_t *num_guesses) {
Minisat::vec<Minisat::Lit> assumptions;
for (int row = 0; row < 9; row++) {
for (int column = 0; column < 9; column++) {
char digit = input[row * 9 + column];
if (digit != '.') {
assumptions.push(Literal(row, column, digit - '1'));
}
}
}
solver.decisions = 0;
bool satisfied = solver.solve(assumptions);
if (satisfied) {
for (int row = 0; row < 9; row++) {
for (int column = 0; column < 9; column++) {
for (int value = 0; value < 9; value++) {
if (solver.model[value + 9 * (column + 9 * row)] ==
Minisat::lbool((uint8_t) 1)) {
solution[row * 9 + column] = value + '1';
}
}
}
}
}
*num_guesses = solver.decisions - 1;
return satisfied;
}
};
} //end anonymous namespace
int main(int argc, const char **argv) {
char *puzzle = NULL;
char solution[81];
size_t size, guesses;
SolverMiniSat solver;
while (getline(&puzzle, &size, stdin) != -1) {
int count = solver.SolveSudoku(puzzle, solution, &guesses);
printf("%.81s:%d:%.81s\n", puzzle, count, solution);
}
}
Python3 - 2min 1.007s official score
It takes around 100 seconds on my i5-9400F
import copy
SUDOKU_VALUES = [1, 2, 4, 8, 16, 32, 64, 128, 256]
SUDOKU_MAX = 511
OPTION_COUNT_CACHE = [
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2,
3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3,
3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3,
4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4,
3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5,
6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4,
4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5,
6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5,
3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3,
4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6,
6, 7, 6, 7, 7, 8, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3,
4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5,
4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3,
4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6,
6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6,
7, 5, 6, 6, 7, 6, 7, 7, 8, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4,
5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8, 3, 4,
4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6,
7, 6, 7, 7, 8, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8, 5, 6, 6, 7,
6, 7, 7, 8, 6, 7, 7, 8, 7, 8, 8, 9
] # Basically just .count_ones()
class SudokuEmpty:
def __init__(self):
self.data = list(range(81))
self.pos = 81
def remove(self, index):
self.pos -= 1
data = self.data
data[index], data[self.pos] = data[self.pos], data[index]
class Solver:
def __init__(self, sudoku):
self.to_explore = SudokuEmpty()
self.options = [SUDOKU_MAX for _ in range(81)]
for (i, item) in enumerate(sudoku):
if item != 0:
self.options[i] = SUDOKU_VALUES[item - 1]
self.apply_number(i)
def hidden_singles(self, square):
options = self.options
value = options[square]
options[square] = 0
row_start = square // 9 * 9
column_start = square % 9
box_start = square // 3 % 3 * 3 + square // 27 * 27
needed = (SUDOKU_MAX
- ((options[row_start + 8]
| options[row_start + 7]
| options[row_start + 6]
| options[row_start + 5]
| options[row_start + 4]
| options[row_start + 3]
| options[row_start + 2]
| options[row_start + 1]
| options[row_start])
& (options[column_start + 72]
| options[column_start + 63]
| options[column_start + 54]
| options[column_start + 45]
| options[column_start + 36]
| options[column_start + 27]
| options[column_start + 18]
| options[column_start + 9]
| options[column_start])
& (options[box_start + 20]
| options[box_start + 19]
| options[box_start + 18]
| options[box_start + 11]
| options[box_start + 10]
| options[box_start + 9]
| options[box_start + 2]
| options[box_start + 1]
| options[box_start])))
option_count = OPTION_COUNT_CACHE[needed]
if option_count == 0:
self.options[square] = value
return True
elif option_count == 1:
if value & needed != 0:
self.options[square] = value & needed
return True
else:
return False
else:
return False
def apply_number(self, square):
options = self.options
value = options[square]
not_value = SUDOKU_MAX - value
column_start = square % 9
row_start = square - column_start
box_start = square // 3 % 3 * 3 + square // 27 * 27
options[row_start + 8] &= not_value
options[row_start + 7] &= not_value
options[row_start + 6] &= not_value
options[row_start + 5] &= not_value
options[row_start + 4] &= not_value
options[row_start + 3] &= not_value
options[row_start + 2] &= not_value
options[row_start + 1] &= not_value
options[row_start] &= not_value
options[column_start + 72] &= not_value
options[column_start + 63] &= not_value
options[column_start + 54] &= not_value
options[column_start + 45] &= not_value
options[column_start + 36] &= not_value
options[column_start + 27] &= not_value
options[column_start + 18] &= not_value
options[column_start + 9] &= not_value
options[column_start] &= not_value
options[box_start + 20] &= not_value
options[box_start + 19] &= not_value
options[box_start + 18] &= not_value
options[box_start + 11] &= not_value
options[box_start + 10] &= not_value
options[box_start + 9] &= not_value
options[box_start + 2] &= not_value
options[box_start + 1] &= not_value
options[box_start] &= not_value
options[square] = value
def process(self, routes):
values = []
while 1:
min_length = 20
min_pos = 0
min_pos_x = 0
x = 0
while x < self.to_explore.pos:
pos = self.to_explore.data[x]
if not self.hidden_singles(pos):
return False
option = self.options[pos]
length = OPTION_COUNT_CACHE[option]
if length < min_length:
if length == 0:
return False
elif length == 1:
for (i, item) in enumerate(SUDOKU_VALUES):
if option == item:
self.apply_number(pos)
self.to_explore.remove(x)
break
else:
min_length = length
min_pos = pos
min_pos_x = x
x += 1
else:
x += 1
if min_length != 20:
values.clear()
options = self.options[min_pos]
for (i, item) in enumerate(SUDOKU_VALUES):
if options & item != 0:
values.append(i + 1)
if not values:
return False
self.to_explore.remove(min_pos_x)
item = values.pop()
for value in values:
clone = copy.deepcopy(self)
clone.options[min_pos] = SUDOKU_VALUES[value - 1]
clone.apply_number(min_pos)
routes.append(clone)
self.options[min_pos] = SUDOKU_VALUES[item - 1]
self.apply_number(min_pos)
else:
return True
def get_result(self):
solution = [0 for _ in range(81)]
for (i, option) in enumerate(self.options):
for (x, value) in enumerate(SUDOKU_VALUES):
if option == value:
solution[i] = x + 1
break
return solution
def solve(sudoku):
routes = [Solver(sudoku)]
while routes:
route = routes.pop()
result = route.process(routes)
if result:
return route.get_result()
raise Exception("Empty routes, but still unsolved")
if __name__ == '__main__':
with open('all_17_clue_sudokus.txt') as file:
sudokus = file.read().splitlines()
print(sudokus[0])
for sudoku in sudokus[1:]:
solution = ''.join(map(str, solve(list(map(int, sudoku)))))
print('%s,%s' % (sudoku, solution))
The path to the sudokus is hardcoded, it has to be all_17_clue_sudokus.txt
To run
time python3 lib.py > output
sha256sum output
-
\$\begingroup\$ It was a while since I ran these benchmarks, but if I have some time this weekend I'll try to run yours and give it a score \$\endgroup\$maxb– maxb2020年06月12日 07:50:09 +00:00Commented Jun 12, 2020 at 7:50
Java - 4.056s official score
The main idea of this is to never allocate memory when it is not needed. The only exception are primitives, which should be optimized by the compiler anyway. Everything else is stored as masks and arrays of operations done in each step, which can be undone when the recursion step is completed.
About half of all sudokus are solved completely without backtracking, but if I push that number higher the overall time seems to be slower. I'm planning om rewriting this in C++ and optimize even further, but this solver is becoming a behemoth.
I wanted to implement as much caching as possible, which lead to some issues. For example, if there are two cells on the same row which can only have the number 6, then we have reached an impossible case, and should return to the backtracking. But since I calculated all options in one sweep, and then placed numbers in cells with only one possibility, I didn't double check that I had placed a number in the same row just before. This lead to impossible solutions.
With everything being contained in the arrays defined at the top, the memory usage of the actual solver is about 216kB. The main part of the memory usage comes from the array containing all the puzzles, and the I/O handlers in Java.
EDIT: I have a version which is translated to C++ now, but it isn't vastly faster. The official time is around 3.5 seconds, which isn't a huge improvement. I think the main issue with my implementation is that I keep my masks as arrays rather than bitmasks. I'll try to analyze Arnauld's solution to see what can be done to improve it.
import java.util.HashMap;
import java.util.ArrayList;
import java.util.Arrays;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.File;
import java.io.PrintWriter;
public class Sudoku {
final private int[] unsolvedBoard;
final private int[] solvedBoard;
final private int[][] neighbors;
final private int[][] cells;
private static int[] clues;
final private int[][] mask;
final private int[] formattedMask;
final private int[][] placedMask;
final private boolean[][][] lineMask;
final private int[] lineCounters;
final private int[][] sectionCounters;
final private int[][] sectionMask;
private int easySolved;
private boolean isEasy;
private int totEasy;
private int placedNumbers;
public long totTime = 0;
private boolean solutionFound;
public long lastPrint;
private boolean shouldPrint;
private boolean isImpossible = false;
public Sudoku() {
mask = new int[81][9];
formattedMask = new int[81];
placedMask = new int[64][64];
lineMask = new boolean[64][81][9];
sectionCounters = new int[9][27];
sectionMask = new int[9][27];
lineCounters = new int[64];
neighbors = new int[81][20];
unsolvedBoard = new int[81];
solvedBoard = new int[81];
cells = new int[][] {{0 ,1 ,2 ,9 ,10,11,18,19,20},
{3 ,4 ,5 ,12,13,14,21,22,23},
{6 ,7 ,8 ,15,16,17,24,25,26},
{27,28,29,36,37,38,45,46,47},
{30,31,32,39,40,41,48,49,50},
{33,34,35,42,43,44,51,52,53},
{54,55,56,63,64,65,72,73,74},
{57,58,59,66,67,68,75,76,77},
{60,61,62,69,70,71,78,79,80}};
}
final public long solveSudoku(int[] board, int clue) {
long t1 = 0,t2 = 0;
t1 = System.nanoTime();
System.arraycopy(board, 0, unsolvedBoard, 0, 81);
System.arraycopy(board, 0, solvedBoard, 0, 81);
placedNumbers = 0;
solutionFound = false;
isEasy = true;
isImpossible = false;
for (int[] i : mask) {
Arrays.fill(i, 0);
}
for (boolean[][] i : lineMask) {
for (boolean[] j : i) {
Arrays.fill(j, false);
}
}
for (int i = 0; i < 81; i++) {
if (solvedBoard[i] != -1) {
put(i, solvedBoard[i]);
placedNumbers++;
}
}
solve(0, 0);
t2 = System.nanoTime();
easySolved += isEasy ? 1 : 0;
if (solutionFound && placedNumbers == 81) {
totTime += t2-t1;
if (shouldPrint || t2-t1 > 5*1_000_000_000L) {
System.out.print(String.format(
"Solution from %2d clues found in %7s",
clue,
printTime(t1, t2)
));
shouldPrint = false;
if (t2-t1 > 1*1000_000_000L) {
System.out.println();
display2(board, solvedBoard);
}
}
} else {
System.out.println("No solution");
display2(unsolvedBoard, solvedBoard);
return -1;
}
return t2 - t1;
}
final private void solve(int v, int vIndex) {
lineCounters[vIndex] = 0;
int easyIndex = placeEasy(vIndex);
if (isImpossible) {
resetEasy(vIndex, easyIndex);
resetLineMask(vIndex);
return;
}
if (placedNumbers == 81) {
solutionFound = true;
return;
}
// if (true) {
// return;
// }
// either get the next empty cell
// while (v < 81 && solvedBoard[v] >= 0) {
// v++;
// }
// or get the cell with the fewest options
generateFormattedMasks();
int minOptions = 9;
for (int i = 0; i < 81; i++) {
int options = formattedMask[i] & 0xffff;
if (options > 0 && options < minOptions) {
minOptions = options;
v = i;
}
if (options == 0 && solvedBoard[i] == -1) {
isImpossible = true;
}
}
if (!isImpossible) {
for (int c = 0; c < 9; c++) {
if (isPossible(v, c)) {
isEasy = false;
put(v, c);
placedNumbers++;
solve(v + 1, vIndex + 1);
if (solutionFound) {
return;
}
unput(v, c);
placedNumbers--;
}
}
}
resetEasy(vIndex, easyIndex);
resetLineMask(vIndex);
}
final private void resetEasy(int vIndex, int easyIndex) {
for (int i = 0; i < easyIndex; i++) {
int tempv2 = placedMask[vIndex][i];
int c2 = solvedBoard[tempv2];
unput(tempv2, c2);
placedNumbers--;
}
}
final private void resetLineMask(int vIndex) {
if (lineCounters[vIndex] > 0) {
for (int i = 0; i < 81; i++) {
for (int c = 0; c < 9; c++) {
if (lineMask[vIndex][i][c]) {
enable(i, c);
lineMask[vIndex][i][c] = false;
}
}
}
}
isImpossible = false;
}
final private int placeEasy(int vIndex) {
int easyIndex = 0;
int lastPlaced = 0, tempPlaced = 0, easyplaced = 0;
int iter = 0;
while (placedNumbers > lastPlaced+1) {
lastPlaced = placedNumbers;
tempPlaced = 0;
while (placedNumbers > tempPlaced + 5) {
tempPlaced = placedNumbers;
easyIndex = placeNakedSingles(vIndex, easyIndex);
if (isImpossible) {
return easyIndex;
}
}
tempPlaced = 0;
while (placedNumbers < 55*1 && placedNumbers > tempPlaced + 2) {
tempPlaced = placedNumbers;
easyIndex = placeHiddenSingles(vIndex, easyIndex);
if (isImpossible) {
return easyIndex;
}
}
tempPlaced = 0;
while (placedNumbers < 65*1 && placedNumbers > tempPlaced + 1) {
tempPlaced = placedNumbers;
easyIndex = placeNakedSingles(vIndex, easyIndex);
if (isImpossible) {
return easyIndex;
}
}
if (iter < 2 && placedNumbers < 55*1) {
checkNakedTriples(vIndex);
}
if (placedNumbers < 45*1) {
checkNakedDoubles(vIndex);
identifyLines(vIndex);
}
iter++;
}
return easyIndex;
}
final private int placeNakedSingles(int vIndex, int easyIndex) {
generateFormattedMasks();
for (int tempv = 0; tempv < 81; tempv++) {
int possibilities = formattedMask[tempv];
if ((possibilities & 0xffff) == 1) {
possibilities >>= 16;
int c = 0;
while ((possibilities & 1) == 0) {
possibilities >>= 1;
c++;
}
if (isPossible(tempv, c)) {
put(tempv, c);
placedMask[vIndex][easyIndex++] = tempv;
placedNumbers++;
} else {
isImpossible = true;
return easyIndex;
}
} else if (possibilities == 0 && solvedBoard[tempv] == -1) {
isImpossible = true;
return easyIndex;
}
}
return easyIndex;
}
final private int placeHiddenSingles(int vIndex, int easyIndex) {
for (int[] i : sectionCounters) {
Arrays.fill(i, 0);
}
for (int c = 0; c < 9; c++) {
for (int v = 0; v < 81; v++) {
if (isPossible(v, c)) {
int cell = 3 * (v / 27) + ((v / 3) % 3);
sectionCounters[c][v / 9]++;
sectionCounters[c][9 + (v % 9)]++;
sectionCounters[c][18 + cell]++;
sectionMask[c][v / 9] = v;
sectionMask[c][9 + (v % 9)] = v;
sectionMask[c][18 + cell] = v;
}
}
int v;
for (int i = 0; i < 9; i++) {
if (sectionCounters[c][i] == 1) {
v = sectionMask[c][i];
if (isPossible(v, c)) {
put(v, c);
placedMask[vIndex][easyIndex++] = v;
placedNumbers++;
int cell = 3 * (v / 27) + ((v / 3) % 3);
sectionCounters[c][9 + (v%9)] = 9;
sectionCounters[c][18 + cell] = 9;
} else {
isImpossible = true;
return easyIndex;
}
}
}
for (int i = 9; i < 18; i++) {
if (sectionCounters[c][i] == 1) {
v = sectionMask[c][i];
if (isPossible(v, c)) {
put(v, c);
placedMask[vIndex][easyIndex++] = v;
int cell = 3 * (v / 27) + ((v / 3) % 3);
placedNumbers++;
sectionCounters[c][18 + cell]++;
} else {
isImpossible = true;
return easyIndex;
}
}
}
for (int i = 18; i < 27; i++) {
if (sectionCounters[c][i] == 1) {
v = sectionMask[c][i];
if (isPossible(v, c)) {
put(v, c);
placedMask[vIndex][easyIndex++] = v;
placedNumbers++;
} else {
isImpossible = true;
return easyIndex;
}
}
}
}
return easyIndex;
}
final private int getFormattedMask(int v) {
if (solvedBoard[v] >= 0) {
return 0;
}
int x = 0;
int y = 0;
for (int c = 8; c >= 0; c--) {
x <<= 1;
x += mask[v][c] == 0 ? 1 : 0;
y += mask[v][c] == 0 ? 1 : 0;
}
x <<= 16;
return x + y;
}
final private int getCachedMask(int v) {
return formattedMask[v];
}
final private void generateFormattedMasks() {
for (int i = 0; i < 81; i++) {
formattedMask[i] = getFormattedMask(i);
}
}
final private void generateFormattedMasks(int[] idxs) {
for (int i : idxs) {
formattedMask[i] = getFormattedMask(i);
}
}
final private void checkNakedDoubles(int vIndex) {
generateFormattedMasks();
for (int i = 0; i < 81; i++) {
int bitmask = formattedMask[i];
if ((bitmask & 0xffff) == 2) {
for (int j = i+1; j < (i/9+1)*9; j++) {
int bitmask_j = formattedMask[j];
if (bitmask == bitmask_j) {
bitmask >>= 16;
int c0, c1, k = 0;
while ((bitmask & 1) == 0) {
k++;
bitmask >>= 1;
}
c0 = k;
bitmask >>= 1;
k++;
while ((bitmask & 1) == 0) {
k++;
bitmask >>= 1;
}
c1 = k;
for (int cell = (i/9)*9; cell < (i/9+1)*9; cell++) {
if (cell != i && cell != j) {
if (!lineMask[vIndex][cell][c0]) {
disable(cell, c0);
lineMask[vIndex][cell][c0] = true;
lineCounters[vIndex]++;
}
if (!lineMask[vIndex][cell][c1]) {
disable(cell, c1);
lineMask[vIndex][cell][c1] = true;
lineCounters[vIndex]++;
}
}
}
}
}
}
}
for (int idx = 0; idx < 81; idx++) {
int i = (idx%9)*9 + idx/9;
int bitmask = formattedMask[i];
if ((bitmask & 0xffff) == 2) {
for (int j = i+9; j < 81; j += 9) {
int bitmask_j = formattedMask[j];
if (bitmask == bitmask_j) {
bitmask >>= 16;
int c0, c1, k = 0;
while ((bitmask & 1) == 0) {
k++;
bitmask >>= 1;
}
c0 = k;
bitmask >>= 1;
k++;
while ((bitmask & 1) == 0) {
k++;
bitmask >>= 1;
}
c1 = k;
for (int cell = i % 9; cell < 81; cell += 9) {
if (cell != i && cell != j) {
if (!lineMask[vIndex][cell][c0]) {
disable(cell, c0);
lineMask[vIndex][cell][c0] = true;
lineCounters[vIndex]++;
}
if (!lineMask[vIndex][cell][c1]) {
disable(cell, c1);
lineMask[vIndex][cell][c1] = true;
lineCounters[vIndex]++;
}
}
}
}
}
}
}
for (int idx = 0; idx < 9; idx++) {
for (int i = 0; i < 9; i++) {
int bitmask = formattedMask[cells[idx][i]];
if ((bitmask & 0xffff) == 2) {
for (int j = i+1; j < 9; j++) {
int bitmask_j = formattedMask[cells[idx][j]];
if (bitmask == bitmask_j) {
bitmask >>= 16;
int c0, c1, k = 0;
while ((bitmask & 1) == 0) {
k++;
bitmask >>= 1;
}
c0 = k;
bitmask >>= 1;
k++;
while ((bitmask & 1) == 0) {
k++;
bitmask >>= 1;
}
c1 = k;
for (int cellIdx = 0; cellIdx < 9; cellIdx++) {
if (cellIdx != i && cellIdx != j) {
int cell = cells[idx][cellIdx];
if (!lineMask[vIndex][cell][c0]) {
disable(cell, c0);
lineMask[vIndex][cell][c0] = true;
lineCounters[vIndex]++;
}
if (!lineMask[vIndex][cell][c1]) {
disable(cell, c1);
lineMask[vIndex][cell][c1] = true;
lineCounters[vIndex]++;
}
}
}
}
}
}
}
}
}
final private void checkNakedTriples(int vIndex) {
generateFormattedMasks();
for (int i = 0; i < 81; i++) {
int bitmask = formattedMask[i];
if ((bitmask & 0xffff) == 3) {
for (int j = i+1; j < (i/9+1)*9; j++) {
int bitmask_j = formattedMask[j];
if (bitmask_j > 0 && bitmask == (bitmask | bitmask_j)) {
for (int k = j+1; k < (i/9+1)*9; k++) {
int bitmask_k = formattedMask[k];
if (bitmask_k > 0 && bitmask == (bitmask | bitmask_k)) {
int bitmask_shifted = bitmask >> 16;
int c0, c1, c2, l = 0;
while ((bitmask_shifted & 1) == 0) {
l++;
bitmask_shifted >>= 1;
}
c0 = l;
bitmask_shifted >>= 1;
l++;
while ((bitmask_shifted & 1) == 0) {
l++;
bitmask_shifted >>= 1;
}
c1 = l;
bitmask_shifted >>= 1;
l++;
while ((bitmask_shifted & 1) == 0) {
l++;
bitmask_shifted >>= 1;
}
c2 = l;
for (int cell = (i/9)*9; cell < (i/9+1)*9; cell++) {
if (cell != i && cell != j && cell != k) {
if (!lineMask[vIndex][cell][c0]) {
disable(cell, c0);
lineMask[vIndex][cell][c0] = true;
lineCounters[vIndex]++;
}
if (!lineMask[vIndex][cell][c1]) {
disable(cell, c1);
lineMask[vIndex][cell][c1] = true;
lineCounters[vIndex]++;
}
if (!lineMask[vIndex][cell][c2]) {
disable(cell, c2);
lineMask[vIndex][cell][c2] = true;
lineCounters[vIndex]++;
}
}
}
}
}
}
}
}
}
for (int idx = 0; idx < 81; idx++) {
int i = (idx%9)*9 + idx/9;
int bitmask = formattedMask[i];
if ((bitmask & 0xffff) == 3) {
for (int j = i+9; j < 81; j += 9) {
int bitmask_j = formattedMask[j];
if (bitmask_j > 0 && bitmask == (bitmask | bitmask_j)) {
for (int k = j+9; k < 81; k += 9) {
int bitmask_k = formattedMask[k];
if (bitmask_k > 0 && bitmask == (bitmask | bitmask_k)) {
int bitmask_shifted = bitmask >> 16;
int c0, c1, c2, l = 0;
while ((bitmask_shifted & 1) == 0) {
l++;
bitmask_shifted >>= 1;
}
c0 = l;
bitmask_shifted >>= 1;
l++;
while ((bitmask_shifted & 1) == 0) {
l++;
bitmask_shifted >>= 1;
}
c1 = l;
bitmask_shifted >>= 1;
l++;
while ((bitmask_shifted & 1) == 0) {
l++;
bitmask_shifted >>= 1;
}
c2 = l;
for (int cell = i%9; cell < 81; cell += 9) {
if (cell != i && cell != j && cell != k) {
if (!lineMask[vIndex][cell][c0]) {
disable(cell, c0);
lineMask[vIndex][cell][c0] = true;
lineCounters[vIndex]++;
}
if (!lineMask[vIndex][cell][c1]) {
disable(cell, c1);
lineMask[vIndex][cell][c1] = true;
lineCounters[vIndex]++;
}
if (!lineMask[vIndex][cell][c2]) {
disable(cell, c2);
lineMask[vIndex][cell][c2] = true;
lineCounters[vIndex]++;
}
}
}
}
}
}
}
}
}
for (int idx = 0; idx < 9; idx++) {
for (int i = 0; i < 9; i++) {
int bitmask = formattedMask[cells[idx][i]];
if ((bitmask & 0xffff) == 3) {
for (int j = i+1; j < 9; j++) {
int bitmask_j = formattedMask[cells[idx][j]];
if (bitmask_j > 0 && bitmask == (bitmask | bitmask_j)) {
for (int k = j+1; k < 9; k++) {
int bitmask_k = formattedMask[cells[idx][k]];
if (bitmask_k > 0 && bitmask == (bitmask | bitmask_k)) {
int bitmask_shifted = bitmask >> 16;
int c0, c1, c2, l = 0;
while ((bitmask_shifted & 1) == 0) {
l++;
bitmask_shifted >>= 1;
}
c0 = l;
bitmask_shifted >>= 1;
l++;
while ((bitmask_shifted & 1) == 0) {
l++;
bitmask_shifted >>= 1;
}
c1 = l;
bitmask_shifted >>= 1;
l++;
while ((bitmask_shifted & 1) == 0) {
l++;
bitmask_shifted >>= 1;
}
c2 = l;
for (int cellIdx = 0; cellIdx < 9; cellIdx++) {
if (cellIdx != i && cellIdx != j && cellIdx != k) {
int cell = cells[idx][cellIdx];
if (!lineMask[vIndex][cell][c0]) {
disable(cell, c0);
lineMask[vIndex][cell][c0] = true;
lineCounters[vIndex]++;
}
if (!lineMask[vIndex][cell][c1]) {
disable(cell, c1);
lineMask[vIndex][cell][c1] = true;
lineCounters[vIndex]++;
}
if (!lineMask[vIndex][cell][c2]) {
disable(cell, c2);
lineMask[vIndex][cell][c2] = true;
lineCounters[vIndex]++;
}
}
}
}
}
}
}
}
}
}
}
final private void identifyLines(int vIndex) {
int disabledLines = 0;
int[][] tempRowMask = new int[3][9];
int[][] tempColMask = new int[3][9];
for (int i = 0; i < 9; i++) {
for (int c = 0; c < 9; c++) {
for (int j = 0; j < 3; j++) {
tempRowMask[j][c] = 0;
tempColMask[j][c] = 0;
}
for (int j = 0; j < 9; j++) {
if (mask[cells[i][j]][c] == 0) {
tempRowMask[j/3][c]++;
tempColMask[j%3][c]++;
}
}
int rowCount = 0;
int colCount = 0;
int rowIdx = -1, colIdx = -1;
for (int j = 0; j < 3; j++) {
if (tempRowMask[j][c] > 0) {
rowCount++;
rowIdx = j;
}
if (tempColMask[j][c] > 0) {
colCount++;
colIdx = j;
}
}
if (rowCount == 1) {
for (int j = (i/3)*3; j < (i/3 + 1)*3; j++) {
if (j != i) {
for (int k = rowIdx*3; k < (rowIdx+1)*3; k++) {
int cell = cells[j][k];
if (!lineMask[vIndex][cell][c]) {
disable(cell, c);
lineMask[vIndex][cell][c] = true;
lineCounters[vIndex]++;
}
}
}
}
}
if (colCount == 1) {
for (int j = i % 3; j < 9; j += 3) {
if (j != i) {
for (int k = colIdx; k < 9; k += 3) {
int cell = cells[j][k];
if (!lineMask[vIndex][cell][c]) {
disable(cell, c);
lineMask[vIndex][cell][c] = true;
lineCounters[vIndex]++;
}
}
}
}
}
}
}
}
final private boolean isPossible(int v, int c) {
return mask[v][c] == 0;
}
final private int checkMask(int[][] neighbors, int v, int c) {
int tempValue = 0;
for (int n : neighbors[v]) {
if (mask[n][c] > 0) {
tempValue++;
}
}
return tempValue;
}
final private void put(int v, int c) {
solvedBoard[v] = c;
for (int i : neighbors[v]) {
mask[i][c]++;
}
for (int i = 0; i < 9; i++) {
mask[v][i]++;
}
}
final private void disable(int v, int c) {
mask[v][c]++;
}
final private void unput(int v, int c) {
solvedBoard[v] = -1;
for (int i : neighbors[v]) {
mask[i][c]--;
}
for (int i = 0; i < 9; i++) {
mask[v][i]--;
}
}
final private void enable(int v, int c) {
// enables++;
mask[v][c]--;
}
public String getString(int[] board) {
StringBuilder s = new StringBuilder();
for (int i : board) {
s.append(i+1);
}
return s.toString();
}
public long getTime() {
return totTime;
}
public static String printTime(long t1, long t2) {
String unit = " ns";
if (t2-t1 > 10000) {
unit = " us";
t1 /= 1000; t2 /= 1000;
}
if (t2-t1 > 10000) {
unit = " ms";
t1 /= 1000; t2 /= 1000;
}
if (t2-t1 > 10000) {
unit = " seconds";
t1 /= 1000; t2 /= 1000;
}
return (t2-t1) + unit;
}
public void display(int[] board) {
for (int i = 0; i < 9; i++) {
if (i % 3 == 0) {
System.out.println("+-----+-----+-----+");
}
for (int j = 0; j < 9; j++) {
if (j % 3 == 0) {
System.out.print("|");
} else {
System.out.print(" ");
}
if (board[i*9+j] != -1) {
System.out.print(board[i*9+j]+1);
} else {
System.out.print(" ");
}
}
System.out.println("|");
}
System.out.println("+-----+-----+-----+");
}
public void display2(int[] board, int[] solved) {
for (int i = 0; i < 9; i++) {
if (i % 3 == 0) {
System.out.println("+-----+-----+-----+ +-----+-----+-----+");
}
for (int j = 0; j < 9; j++) {
if (j % 3 == 0) {
System.out.print("|");
} else {
System.out.print(" ");
}
if (board[i*9+j] != -1) {
System.out.print(board[i*9+j]+1);
} else {
System.out.print(" ");
}
}
System.out.print("| ");
for (int j = 0; j < 9; j++) {
if (j % 3 == 0) {
System.out.print("|");
} else {
System.out.print(" ");
}
if (solved[i*9+j] != -1) {
System.out.print(solved[i*9+j]+1);
} else {
System.out.print(" ");
}
}
System.out.println("|");
}
System.out.println("+-----+-----+-----+ +-----+-----+-----+");
}
private boolean contains(int[] a, int v) {
for (int i : a) {
if (i == v) {
return true;
}
}
return false;
}
public void connect() {
for (int i = 0; i < 81; i++) {
for (int j = 0; j < 20; j++) {
neighbors[i][j] = -1;
}
}
int[] n_count = new int[81];
HashMap<Integer,ArrayList<Integer>> map
= new HashMap<Integer,ArrayList<Integer>>();
for (int[] c: cells) {
ArrayList<Integer> temp = new ArrayList<Integer>();
for (int v : c) {
temp.add(v);
}
for (int v : c) {
map.put(v,temp);
}
}
for (int i = 0; i < 81; i++) {
for (int j = (i/9)*9; j < (i/9)*9 + 9; j++) {
if (i != j) {
neighbors[i][n_count[i]++] = j;
}
}
for (int j = i%9; j < 81; j += 9) {
if (i != j) {
neighbors[i][n_count[i]++] = j;
}
}
for (int j : map.get(i)) {
if (i != j) {
if (!contains(neighbors[i], j)) {
neighbors[i][n_count[i]++] = j;
}
}
}
}
}
public static int[][] getInput(String filename) {
int[][] boards;
try (BufferedInputStream in = new BufferedInputStream(
new FileInputStream(filename))) {
BufferedReader r = new BufferedReader(
new InputStreamReader(in, StandardCharsets.UTF_8));
int n = Integer.valueOf(r.readLine());
boards = new int[n][81];
clues = new int[n];
for (int i = 0; i < n; i++) {
for (int j = 0; j < 81; j++) {
int x = r.read();
boards[i][j] = x - 49;
clues[i] += x > 48 ? 1 : 0;
}
r.read();
}
r.close();
} catch (IOException ex) {
throw new RuntimeException(ex);
}
return boards;
}
private int getTotEasy() {
return totEasy;
}
public String getSolution() {
StringBuilder s = new StringBuilder(256);
for (int i : unsolvedBoard) {
s.append(i+1);
}
s.append(",");
for (int i : solvedBoard) {
s.append(i+1);
}
return s.toString();
}
public static void main (String[] args) {
long t0 = System.nanoTime();
Sudoku gc = new Sudoku();
File f;
PrintWriter p;
try {
f = new File("sudoku_output.txt");
p = new PrintWriter(f);
} catch (Exception e) {
return;
}
if (args.length != 1) {
System.out.println("Usage: java Sudoku <input_file>");
return;
}
int[][] boards = gc.getInput(args[0]);
long tinp = System.nanoTime();
gc.connect();
long t1 = System.nanoTime();
p.println(boards.length);
long maxSolveTime = 0;
int maxSolveIndex = 0;
long[] solveTimes = new long[boards.length];
for (int i = 0; i < boards.length; i++) {
long tempTime = System.nanoTime();
if (tempTime - gc.lastPrint > 200_000_000
|| i == boards.length - 1) {
gc.shouldPrint = true;
gc.lastPrint = tempTime;
System.out.print(String.format(
"\r(%7d/%7d) ", i+1, boards.length));
}
long elapsed = gc.solveSudoku(boards[i], gc.clues[i]);
if (elapsed == -1) {
System.out.println("Impossible: " + i);
}
if (elapsed > maxSolveTime) {
maxSolveTime = elapsed;
maxSolveIndex = i;
}
solveTimes[i] = elapsed;
p.println(gc.getSolution());
// break;
}
p.close();
long t2 = System.nanoTime();
Arrays.sort(solveTimes);
System.out.println();
System.out.println("Median solve time: "
+ gc.printTime(0, solveTimes[boards.length/2]));
System.out.println("Longest solve time: "
+ gc.printTime(0, maxSolveTime) + " for board " + maxSolveIndex);
gc.display(boards[maxSolveIndex]);
System.out.println();
System.out.println("Total time (including prints): "
+ gc.printTime(t0,t2));
System.out.println("Sudoku solving time: "
+ gc.printTime(0,gc.getTime()));
System.out.println("Average time per board: "
+ gc.printTime(0,gc.getTime()/boards.length));
System.out.println("Number of one-choice digits per board: "
+ String.format("%.2f", gc.getTotEasy()/(double)boards.length));
System.out.println("Easily solvable boards: " + gc.easySolved);
System.out.println("\nInput time: " + gc.printTime(t0,tinp));
System.out.println("Connect time: " + gc.printTime(tinp,t1));
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
}
}
}
-
\$\begingroup\$ I am am not mistaken you should save some time translating those jagged arrays into 2D arrays. \$\endgroup\$CSharpie– CSharpie2019年12月21日 13:14:14 +00:00Commented Dec 21, 2019 at 13:14
Zig - 0.070s unofficial score
https://github.com/mstdokumaci/sudokumaci/tree/main/zig
Build: zig build-exe -O ReleaseFast main.zig
Run: ./main ../test-data/all_17_clue.sudokus > ../test-data/all_17_clue.solved
Edited 2025年09月19日
C++: Schoku (unofficial) 0.021s (multi-threaded), 0.113s (single-threaded)
Update:
According to Schoku's internal time measurements, it is faster than all other entries on this page. Given the test set (17 clue Sudokus) and single-threaded execution.
Obviously, multi-threaded it trounces the competition.
In the tdoku benchmark, it comes out first or close second in all test sets, reaching from trivial to extremely hard. See the updated link below.
So unless there's a new solver out there that is way better but not shown here, this is the fastest Sudoku solver - period.
I realize that some time has passed since this challenge was created and a winner called out.
The results were obtained on a Ryzen 7 4700U (8 cores) clocked at (up to) 4.1GHz.
When multi-threading, the Core speed is around 3.3GHz and 4.1GHz single-threaded.
Since this CPU has comparable or lower performance then some other
CPUs mentioned here, I assume that these timings might equal or beat the other contenders, either
multi-threaded or single-threaded.
I based my submission on Mirage's original code posted here. I found the principled economy of lines of code and solving approaches impressive and it provided an excellent basis for further exploration.
I think tdoku largely deserves the winner's crown - | do hope though that my solver can be useful in one way or another.
What to measure
I understand that multi-threading was not forbidden by the rules, so I provide both multi-threaded and single threaded timings. It was said here that all programs could benefit in comparable ways from multi-threading. I found that Mirage's code provides a perfect counter example, as it does not benefit from multi-threading as much as it could.
When I started using the time command for measuring the execution time I realized
that for the relatively small set of 49151 puzzles the pure overhead of loading and starting the program makes up a significant
and constant portion of the measured time (around 20ms or nearly 40% in my case).
While this penalty is shared between all submissions, it distorts nonetheless the relative
performance and different platforms may carry different penalties. The ratio
between multi-threaded and single-threaded execution speed for example is around 5.3 in reality,
while with the startup overhead it is under 3.
The best way would be to benchmark using the tdoku benchmark suite, see my results here: Schoku benchmark using tdoku. The second best way is to concatenate the 49151 puzzle lines 10 time to 491510 lines, which should reduce the distortion by 90%.
Schoku also provides an option (-y) to time the program internally and -x to obtain full statistics.
Abstract
To jump to the summary line: By direct comparison on my test computer, I found that my version with changes and enhancements runs around 4.5 times as fast as the original submission in multithreaded mode, based on the internal time measurement and about 2.0 times faster in single-threaded mode.
When using the external clock timing the ratio is 3.1 (multi-threaded).
All this on my test computer, a Ryzen 7 Mobile 4700U processor laptop.
The other statistics look good too: The solving rate of 76.98% vs. 63.79% without guessing and 30798 against 57149 guesses.
Schoku Performance
The picture below shows Schoku performance evolving over several versions relative to Mirage's initial submission (0.1).
You can find the full story and complete commented code on Github: Schoku.
Sample output
The source code is unfortunately not available here as StackExchange limits the overall size. Instead some sample output (updated):
schoku version: 1.0 speed
command options: -x
compile options:
49151 puzzles entered
49151 2343280/s puzzles solved
21.0ms 426ns/puzzle solving time
38596 78.53% puzzles solved without guessing
21584 0.44/puzzle guesses
13497 0.27/puzzle back tracks
170602 3.47/puzzle digits entered and retracted
22431 0.46/puzzle 'rounds'
114463 2.33/puzzle triads resolved
209017 4.25/puzzle triad updates
134 0.00/puzzle board states without bivalues
709 bi-value universal graves detected
You can find the complete and commented source code I tested here: Schoku source (speed branch).
To compile the program, save it as schoku.cpp, then compile with g++ (11.4.0 or 13.2.1) as follows:
g++ -std=gnu++17 -O3 -fopenmp -pthread -Wall -Wextra -DNDEBUG -mavx2 -mbmi -mbmi2 -mlzcnt -c -MMD -MP -MF ./schoku.d schoku.cpp -o schoku.o`
g++ -fopenmp -pthread schoku.o -o schoku
To run the program, make sure you have puzzles.txt (49151 17-clue puzzles) in the same folder:
rm solutions.txt; time schoku -y
Note the important deletion of solutions.txt. Try using a solid state disk, turning off your Antivirus program and an uncluttered directory.
Usage
At the same time, I added several crucial options to the program:
-c check for back tracking even when no guess was made (e.g. if puzzles might have no solution)
-d# provide some detailed information on the progress of the puzzle solving.
add a 2 for even more detail.
-h help information (this text)
-l# solve a single line from the puzzle.
-mT execution modes (triads resolution)
-r[ROM] puzzle rules:
R for regular puzzles (unique solution exists)
not suitable for puzzles that have multiple solutions
O find just one solution
this will find the first of multiple solutions and stop looking
M determine whether multiple solutions exist beyond the first found
-t# set the number of threads
-v verify the solution
-w display warning (mostly unexpected solving details for regular puzzles)
-x provide some statistics
-y provide speed statistics only
-#1 change base for row and column reporting from 0 to 1
Code Changes
I turned the program into C++, using g++ and as many of the g++/gcc features I found useful, in particular: inlining, templates, vector extensions and builtins.
While I was at it, I turned to using intrinsics and other builtins for direct access to the fastest available
instructions for my purpose.
Looking at speed improvements in terms of low hanging fruit, I found that the repeated allocation of the GridState
structures did impact performance signifcantly and replaced it by fixed stacks allocated by each thread only once,
which turned out to improve the multi-threaded timing significantly.
The hidden single search allowed for removal of repeating calculations, which offered a welcome boost. I also made the search cover all hidden singles.
In the program shown here, I removed completely the hidden single search and
added a fast triad resolution pass (also known as locked candidates).
The guessing code was much improved.
Throughout, I also added lookup tables where they were useful.
Then I integrated the entering loop (for solved cells) with the search for the next naked single. This unfortunately involved some additional spaghetti coding techniques in form of gotos.
I also rewrote the puzzle input and output parts using AVX2.
C - 12min 28.374s official score
runs for about (削除) 30m (削除ここまで) 15m on my i5-7200U and produces the correct md5 hash
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<sys/time.h>
#define B break
#define O const
#define P printf
#define R return
#define S static
#define $(x,y...) if(x){y;}
#define E(x...) else{x;}
#define W(x,y...) while(x){y;}
#define fi(x,y...) for(I i=0,_n=(x);i<_n;i++){y;}
#define fj(x,y...) for(I j=0,_n=(x);j<_n;j++){y;}
typedef void V;typedef char C;typedef short H;typedef int I;typedef long long L;
S C h[81][20]; //h[i][0],h[i][1],..,h[i][19] are the squares that clash with square i
S H a[81] //a[i]: bitmask of possible choices; initially one of 1<<0, 1<<1 .. 1<<8, or 511 (i.e. nine bits set)
,b[81]; //b[i]: negated bitmask of impossible chioces; once we know square i has value v, b[i] becomes ~(1<<v)
S I f(){ //f:recursive solver
I p=-1; //keep track of the popcount (number of 1 bits) in a
W(1,I q=0; //simple non-recursive deductions:
fi(81,fj(20,a[i]&=b[h[i][j]]) // a[i] must not share bits with its clashing squares
$(!(a[i]&a[i]-1),$(!a[i],R 0)b[i]=~a[i]) // if a[i] has one bit left, update b[i]. if a[i]=0, we have a contradiction
q+=__builtin_popcount(a[i])) // compute new popcount
$(p==q,B)p=q;) // if the popcount of a[] changed, try to do more deductions
I k=-1,mc=10;fi(81,$(b[i]==-1,I c=__builtin_popcount(a[i]);$(c<mc,k=i;mc=c;$(c==2,B)))) //find square with fewest options left
$(k==-1,R 1) //if there isn't any such, we're done - success! otherwise k is that square
fi(9,$(a[k]&1<<i,H a0[81],b0[81]; //try different values for square k
memcpy(a0,a,81*sizeof(*a));memcpy(b0,b,81*sizeof(*b)); // save a and b
a[k]=1<<i;b[k]=~a[k];$(f(),R 1) // set square k and make a recursive call
memcpy(a,a0,81*sizeof(*a));memcpy(b,b0,81*sizeof(*b)))) // restore a and b
R 0;}
S L tm(){struct timeval t;gettimeofday(&t,0);R t.tv_sec*1000000+t.tv_usec;} //current time in microseconds
I main(){L t=0;I n;scanf("%d",&n);P("%d\n",n);
fi(81,L l=0;fj(81,$(i!=j&&(i%9==j%9||i/9==j/9||(i/27==j/27&&i%9/3==j%9/3)),h[i][l++]=j))) //precompute h
fi(n,S C s[82];scanf("%s",s);printf("%s,",s); //i/o and loop over puzzles
fj(81,a[j]=s[j]=='0'?511:1<<(s[j]-'1');b[j]=s[j]=='0'?-1:~a[j]) //represent '1' .. '9' as 1<<0 .. 1<<8, and 0 as 511
t-=tm();I r=f();t+=tm(); //measure time only for the solving function
$(!r,P("can't solve\n");exit(1)) //shouldn't happen
fj(81,s[j]=a[j]&a[j]-1?'0':'1'+__builtin_ctz(a[j])) //1<<0 .. 1<<8 to '1' .. '9'
P("%s\n",s)) //output
fflush(stdout);dprintf(2,"time:%lld microseconds\n",t);R 0;} //print self-measured time to stderr so it doesn't affect stdout's md5
compile (preferably with clang v6) and run:
clang -O3 -march=native a.c
time ./a.out <all_17_clue_sudokus.txt | tee o.txt | nl
md5sum o.txt
-
4\$\begingroup\$ Why so ugly? This isn't code-golf! \$\endgroup\$Jonathan Allan– Jonathan Allan2019年08月23日 22:00:16 +00:00Commented Aug 23, 2019 at 22:00
-
4\$\begingroup\$ @JonathanAllan that's how i usually code (unless i'm in a team who prefer to do otherwise). it's beautiful :) \$\endgroup\$ngn– ngn2019年08月23日 22:01:11 +00:00Commented Aug 23, 2019 at 22:01
-
1\$\begingroup\$ Haha, "beautiful", and easy to come back to in 6 months :p \$\endgroup\$Jonathan Allan– Jonathan Allan2019年08月23日 22:06:12 +00:00Commented Aug 23, 2019 at 22:06
-
1\$\begingroup\$ yes, actually. i've been doing this for a couple of years and i find it more efficient. in the apl world it's known as incunabulum style. with bloated code you move your eyes mostly vertically (unnatural and unfit for our landscape monitors) and scroll a lot. with tight code you can see all of it at once, so it's easier to find your way around it and to judge its complexity at a glance. \$\endgroup\$ngn– ngn2019年08月23日 22:13:27 +00:00Commented Aug 23, 2019 at 22:13
-
\$\begingroup\$ Is it a backtracking solution? I see two
memcpyin there and some recursion going on. I'll try to verify it today. \$\endgroup\$maxb– maxb2019年08月24日 06:34:38 +00:00Commented Aug 24, 2019 at 6:34
Typescript(Deno) - 22min 12s unofficial score
on an Lenovo G50-45 Laptop with 4cores and 4threads @1.5GHz
reads from all_17_clue_sudokus.txt or first argument
outputs to output.txt or second argument
download and run with deno run --allow-all https://raw.githubusercontent.com/RubenArtmann/sudoku-solver/master/mod.ts with Deno installed
solver.ts
const parse = (string: string)=>{
let sudoku = new Uint16Array(9*9);
for(let i=0; i<string.length; i++) {
let n = string.charCodeAt(i)-"1".charCodeAt(0);
sudoku[i] = n>=0&&n<9?1<<n:511;
}
return sudoku;
};
const bitCount = (n:number)=>{
n = n - ((n >> 1) & 0x55555555)
n = (n & 0x33333333) + ((n >> 2) & 0x33333333)
return ((n + (n >> 4) & 0xF0F0F0F) * 0x1010101) >> 24
};
const forced = (sudoku:Uint16Array)=>{
for(let i=0; i<sudoku.length; i++) {
let row = i%9;
let column = Math.floor(i/9);
// skip if already spread
if(sudoku[i]&(1<<9)) continue;
if(bitCount(sudoku[i]&511)===1) {
for(let j=0; j<9; j++) {
if(j!==column) sudoku[row+j*9] &= ~(sudoku[i]&511);
if(j!==row) sudoku[j+column*9] &= ~(sudoku[i]&511);
let xoff = row-row%3;
let yoff = column-column%3;
let x = xoff+j%3;
let y = yoff+Math.floor(j/3);
if(x!==row && y!==column) sudoku[x+y*9] &= ~(sudoku[i]&511);
// set already spread bit
sudoku[i] |= 1<<9;
}
}
}
};
const dump = (sudoku:Uint16Array)=>{
return [...sudoku].map(e=>{
let char = "";
for(let j=0; j<9; j++) {
if(e&(1<<j)) char += j+1;
}
if(char.length!==1) {
log(sudoku)
throw new Error("");
}
return char;
}).join("");
};
const log = (sudoku:Uint16Array)=>{
for(let column=0; column<9; column++) {
console.log([...sudoku.slice(column*9,column*9+9)].map((n,row)=>{
let string = "";
for(let j=0; j<9; j++) {
string += sudoku[column*9+row]&(1<<j)?j+1:".";
}
return string;
}).join(" "));
}
console.log("")
};
let count = 0;
const solve = (sudoku:Uint16Array)=>{
let sudokuBuf = new Uint16Array(9*9);
let stack: Uint16Array[] = [sudoku];
let stack_length = 1;
while(stack_length>0) {
count++;
// console.log(stack.length)
stack_length--;
let sudoku: Uint16Array = stack[stack_length] as Uint16Array;
stack[stack_length] = sudokuBuf;
sudokuBuf = sudoku;
// do forced moves
forced(sudoku);
let solved = true;
let invalid = false;
for(let i=0; i<sudoku.length; i++) {
let c = bitCount(sudoku[i]&511);
if(c!==1) solved = false;
if(c===0) {
invalid = true;
break;
}
}
if(invalid) continue;
if(solved) {
// need to apply forced one more time and check for changes / empty cells
forced(sudoku);
let solved = true;
for(let i=0; i<sudoku.length; i++) {
let c = bitCount(sudoku[i]&511);
if(c===0) solved = false;
}
if(solved) return sudoku;
continue;
}
// choose cell to split on
let splitIndex = -1;
for(let i=2; i<=9; i++) {
for(let j=0; j<sudoku.length; j++) {
if(bitCount(sudoku[j]&511)===i) {
splitIndex = j;
i = 1000;
break;
}
}
}
if(splitIndex<0) throw new Error("");
// branch/split
for(let i=0; i<9; i++) {
if(sudoku[splitIndex]&(1<<i)) {
if(stack[stack_length]===undefined) stack[stack_length] = new Uint16Array(9*9);
let newsudoku = stack[stack_length];
newsudoku.set(sudoku,0);
newsudoku[splitIndex] = 1<<i;
stack_length++;
}
}
// log(sudoku);
}
throw new Error("Out of Options!")
return new Uint16Array(9*9);
};
const context: Worker = self as any;
context.onmessage = (event)=>{
let result = dump(solve(parse(event.data.data)));
// console.log(count)
context.postMessage({
data: result,
callback: event.data.callback
})
};
mod.ts
let callbacks = new Map();
let workers: any[] = [];
let tasks: {data:any,callback:number}[] = [];
for(let i=0; i<4; i++) {
let worker = new Worker(new URL("solver.ts", import.meta.url).toString(),{ type: "module" }) as any;
worker.tasks = 0;
worker.onmessage = (event:{data:any})=>{
worker.tasks--;
callbacks.get(event.data.callback)(event.data.data);
callbacks.delete(event.data.callback);
};
workers.push(worker);
}
const sleep = (time:number)=>{
return new Promise((resolve)=>{
setTimeout(resolve,time);
});
};
const loadBalancer = async()=>{
while(true) {
// console.log(workers.map(w=>w.tasks),tasks.length);
let dispatches = 0;
for(let i=0; i<workers.length; i++) {
let worker = workers[i];
if(worker.tasks>=5||tasks.length<=0) continue;
if(worker.taks<=0) console.log("running low!")
let task = tasks.shift();
if(task === undefined) continue;
worker.postMessage(task);
worker.tasks++;
dispatches++;
}
if(dispatches<=0) await sleep(100);
}
};
loadBalancer();
const dispatch = async(data:any)=>{
let random = Math.random();
tasks.push({
data: data,
callback: random
});
return new Promise((resolve)=>{
callbacks.set(random,resolve);
});
};
// dispatch("030000000000001008700580000000024050040873900003600000900000002005002091200000704").then(console.log)
// console.log(workers)
let count = 0;
let file = Deno.readTextFileSync(Deno.args[0]||"./all_17_clue_sudokus.txt");
// let file = Deno.readTextFileSync("./hard_sudokus.txt");
let sudokus = file.split("\n").slice(1);
let promises = sudokus.map(async(e,i)=>{
let result = await dispatch(e);
count++;
console.log(count,"/",sudokus.length);
console.log(e);
console.log(result);
return result;
});
let solutions = await Promise.all(promises);
console.log("done!");
workers.map(w=>w.terminate());
let output = "";
output += `${sudokus.length}\n`;
for(let i=0; i<sudokus.length; i++) {
output += `${sudokus[i]},${solutions[i]}\n`;
}
Deno.writeTextFileSync(Deno.args[1]||"./output.txt",output);
Deno.exit(0)
-
\$\begingroup\$ Welcome to Code Golf! Nice first answer! \$\endgroup\$2021年04月06日 12:29:38 +00:00Commented Apr 6, 2021 at 12:29
-
\$\begingroup\$ Welcome to Code Golf! I'm not super familiar with typescript, but I'll try to boot up my old laptop and see if I can get it running! \$\endgroup\$maxb– maxb2021年04月09日 08:00:32 +00:00Commented Apr 9, 2021 at 8:00
-
1\$\begingroup\$ @maxb it should be as simple as
curl -fsSL https://deno.land/x/install/install.sh | shanddeno run --allow-all https://raw.githubusercontent.com/RubenArtmann/sudoku-solver/master/mod.ts\$\endgroup\$Ruben Artmann– Ruben Artmann2021年04月10日 11:58:28 +00:00Commented Apr 10, 2021 at 11:58
Rust - 0.150s unofficial score
https://github.com/mstdokumaci/sudokumaci/tree/main/rust
Here is my attempt in Rust. For the list of sudoku puzzles above (all_17_clue), I get ~150ms on a MacBook Pro (16-inch, 2019, 2.3 GHz i9-9880H).
I build the binary by this command:
cargo rustc --release -- -C target-cpu=native
I am not an expert on writing optimized code and I am coding in Rust for the first time. I believe the algorithm is at the sweet spot of human solutions and brute force, and it can be further optimized. I would appreciate it if anybody would like to suggest any improvements.
-
\$\begingroup\$ Welcome to Code Golf and Coding Challenges Stack Exchange!, But you are only allowed to post a language per answer, You can split by answers. \$\endgroup\$Fmbalbuena– Fmbalbuena2022年02月28日 13:12:47 +00:00Commented Feb 28, 2022 at 13:12
-
\$\begingroup\$ You can include the score, This will help to compare answers. \$\endgroup\$Fmbalbuena– Fmbalbuena2022年02月28日 13:23:35 +00:00Commented Feb 28, 2022 at 13:23
Explore related questions
See similar questions with these tags.