The fastest Sudoku solver

Question 1

Winner found

It seems as if we have a winner! Unless anyone plans on contesting the world's current fastest Sudoku solver, user 53x15 wins with the staggeringly fast solver Tdoku. For anyone still working on their solvers, I'll still benchmark new submissions when I have time.

The challenge

The goal of a game of Sudoku is to fill the board with the numbers 1-9, one in each cell, in such a way that each row, column and box only contains each number once. A very important aspect of a Sudoku puzzle is that there should only be one valid solution.

The goal of this challenge is simple, you should solve a set of Sudoku puzzles as fast as possible. However, you won't just be solving any old Sudoku, you'll be solving the very hardest Sudoku puzzles in existence, the 17-clue Sudokus. Here's an example:

Hard Sudoku

Rules

Language

You're free to use any language. If I don't have a compiler installed for your language, you should be able to provide a set of command line instructions needed to install an environment where your script can be run on Linux.

Benchmark machine

The benchmark will be run on a Dell XPS 9560, 2.8GHz Intel Core i7-7700HQ (3.8GHz boost) 4 cores, 8 threads, 16GB RAM. GTX 1050 4GB. The machine runs Ubuntu 19.04. Here's the uname output, for anyone interested.

Linux 5.0.0-25-generic #26-Ubuntu SMP Thu Aug 1 12:04:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Input

The input will be given as a file. It can be found here . The file contains 49151 Sudoku puzzles. The first line of the file is the number of puzzles, and every line after that is 81 characters long and represents a puzzle. The unknown cells are 0, and the known cells are 1-9.

Your program should be able to take the filename as an argument, or have the file input from STDIN, to facilitate manual checking of your solution. Please include an instruction for how your program takes input.

Timing / scoring

From discussions in the comments, and some reflection, the scoring criteria has been changed to be the time of your entire program. Your program should produce the output file with the correct hash even during official scoring. This doesn't interfere with any existing solution, and doesn't change the rankings as they stand now. Any thoughts on the scoring system are appreciated.

If two solutions have similar scores for individual runs, I will run multiple benchmarks, and the average time will be the final score. If the average scores differ by less than 2%, I will consider it a draw.

If your solution takes longer than an hour to run, it will not be officially scored. In those cases, you are responsible for reporting the machine on which it ran, and your score. For an optimized solver, this should not be an issue.

EDIT: It was brought to my attention that while difficult, the problem set at hand is not the most difficult there is. If time is available, I'll try to benchmark the solutions presented here against the harder puzzle set, and add the score to each submission. However, this will not be an official scoring, and is just for fun.

Verification

Your solution will be verified by a MD5/SHA256 checksum. Your script should be able to generate a file containing all puzzles and their solutions. However, the file will also be manually inspected, so don't try to get a hash collision. Your output file should match:

MD5: 41704fd7d8fd0723a45ffbb2dbbfa488
SHA256: 0bc8dda364db7b99f389b42383e37b411d9fa022204d124cb3c8959eba252f05

The file will be on the format:

<num_puzzles>
<unsolved_puzzle#1>,<solved_puzzle#1>
<unsolved_puzzle#2>,<solved_puzzle#2>
...
<unsolved_puzzle#n>,<solved_puzzle#n>

with a single trailing newline.

What's not allowed

You are in no way allowed to hard-code solutions. Your algorithm should be applicable on any set of Sudoku puzzles, both easy and harder Sudokus. However, it is entirely fine if your solution is slow for easier puzzles.

You are not allowed to have a non-deterministic program. You are allowed to use a random number generator, but the seed of the generator should be fixed. This rule is to ensure that measurements are more precise, and have less variance. (Thanks to Peter Taylor for the tip)

You are not allowed to use any external resources or web requests during the runtime of your program. Everything should be self-contained. This does not apply to installed libraries and packages, which are allowed.

Other info

If you want another test set to check your solution, here are 10000 easier Sudokus. Here are their solutions.

MD5: 3cb465ef6077c4fcab5bd6ae3bc50d62
SHA256: 0bc8dda364db7b99f389b42383e37b411d9fa022204d124cb3c8959eba252f05

If you have any questions, feel free to ask, and I'll try to clarify any misunderstandings.

Question 2

I have an APL+WIN solver but unless you have a copy of the interpreter on your machine you will have to count me out. For info your hard example took 30ms and the first easy example 16ms.

Question 3

@Graham it took 30ms for all 49151 sudokus, or 30ms on average?

Question 4

Should the entries also be code golfed? Or... Can they be golfed, for the fun it? ;-)

Question 5

@TheMatt I'd prefer non-golfed, just so I can verify that nothing fishy is going on

Question 6

In case you need the world's hardest sudoku: abcnews.go.com/blogs/headlines/2012/06/…

Question 7

C++ - 0.201s official score

Using Tdoku (code; design; benchmarks) gives these results:

~/tdoku$ lscpu | grep Model.name
Model name: Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
~/tdoku$ # build:
~/tdoku$ CC=clang-8 CXX=clang++-8 ./BUILD.sh
~/tdoku$ clang -o solve example/solve.c build/libtdoku.a 
~/tdoku$ # adjust input format:
~/tdoku$ sed -e "s/0/./g" all_17_clue_sudokus.txt > all_17_clue_sudokus.txt.in
~/tdoku$ # solve:
~/tdoku$ time ./solve 1 < all_17_clue_sudokus.txt.in > out.txt
real 0m0.241s
user 0m0.229s
sys 0m0.012s
~/tdoku$ # adjust output format and sha256sum:
~/tdoku$ grep -v "^:0:$" out.txt | sed -e "s/:1:/,/" | tr . 0 | sha256sum
0bc8dda364db7b99f389b42383e37b411d9fa022204d124cb3c8959eba252f05 -

Tdoku has been optimized for hard Sudoku instances. But note, contrary to the problem statement, that 17 clue puzzles are far from the hardest Sudoku. Actually they're among the easiest, with the majority requiring no backtracking at all. See some of the other benchmark datasets in the Tdoku project for puzzles that are actually hard.

Also note that while Tdoku is the fastest solver I'm aware of for hard puzzles, it's not the fastest for 17 clue puzzles. For these I think the fastest is this rust project, a derivative of JCZSolve, which was optimized for 17 clue puzzles during development. Depending on the platform it might be 5-25% faster than Tdoku for these puzzles.

Question 8

Wow, that was an interesting read about the implementation and theory behind it. Before I started this challenge, I wanted to find state of the art solvers and datasets. I guess I didn't look hard enough. From popular "scientific" articles, the 17 clue puzzles were all that's ever talked about, so it was my assumption that those were the hardest. I'll try to run all submissions against the data sets presented in your article, and I'll benchmark your submission later today. Fantastic job!

Question 9

Thanks! You see from the article that finding state-of-the-art solving took me on a long journey. :-) I get why people focus on 17 clue puzzles: the dataset is well known, well defined, complete or nearly so, moderately large, and hard for naive solvers. While it's interesting to study harder puzzles, hardness is tricky to formalize. e.g., do we mean subjectively or empirically hard for humans based on the techniques required? do we mean slow on average for a given solver under random permutations? Do we mean minimum backdoor size under a formula with chosen pigeonhole inferences? etc.

Question 10

0.201s - is that the time taken to solve all ~49k puzzles, roughly 4usec per puzzle?

Question 11

Yes, that's right.

Question 12

C - 0.045s unofficial score

I got this time on my i7-9750H with 6 cores 12 threads @ 4Ghz. I'm aware that my cpu is faster than the i7-7700HQ so I think (hope) it would be closer to 0.080s if it were officially scored.

Compile with: gcc file.c -O3 -march=native -fopenmp

The program takes the input file name as argument. It generates an output.txt file containing all sudokus with their solution. It can be significantly slower when the output file already exists.

This won't run on CPUs that don't support the SSE4.1 and AVX2 instructions set extensions.

#include <stdio.h>
#include <omp.h>
#include <stdbool.h>
#include <intrin.h>
// lookup tables that may or may not speed things up by avoiding division
static const unsigned char box_index[81] = {
 0, 0, 0, 1, 1, 1, 2, 2, 2,
 0, 0, 0, 1, 1, 1, 2, 2, 2,
 0, 0, 0, 1, 1, 1, 2, 2, 2,
 3, 3, 3, 4, 4, 4, 5, 5, 5,
 3, 3, 3, 4, 4, 4, 5, 5, 5,
 3, 3, 3, 4, 4, 4, 5, 5, 5,
 6, 6, 6, 7, 7, 7, 8, 8, 8,
 6, 6, 6, 7, 7, 7, 8, 8, 8,
 6, 6, 6, 7, 7, 7, 8, 8, 8
};
static const unsigned char column_index[81] = {
 0, 1, 2, 3, 4, 5, 6, 7, 8,
 0, 1, 2, 3, 4, 5, 6, 7, 8,
 0, 1, 2, 3, 4, 5, 6, 7, 8,
 0, 1, 2, 3, 4, 5, 6, 7, 8,
 0, 1, 2, 3, 4, 5, 6, 7, 8,
 0, 1, 2, 3, 4, 5, 6, 7, 8,
 0, 1, 2, 3, 4, 5, 6, 7, 8,
 0, 1, 2, 3, 4, 5, 6, 7, 8,
 0, 1, 2, 3, 4, 5, 6, 7, 8,
};
static const unsigned char row_index[81] = {
 0, 0, 0, 0, 0, 0, 0, 0, 0,
 1, 1, 1, 1, 1, 1, 1, 1, 1,
 2, 2, 2, 2, 2, 2, 2, 2, 2,
 3, 3, 3, 3, 3, 3, 3, 3, 3,
 4, 4, 4, 4, 4, 4, 4, 4, 4,
 5, 5, 5, 5, 5, 5, 5, 5, 5,
 6, 6, 6, 6, 6, 6, 6, 6, 6,
 7, 7, 7, 7, 7, 7, 7, 7, 7,
 8, 8, 8, 8, 8, 8, 8, 8, 8,
};
static const unsigned char box_start[81] = {
 0, 0, 0, 3, 3, 3, 6, 6, 6,
 0, 0, 0, 3, 3, 3, 6, 6, 6,
 0, 0, 0, 3, 3, 3, 6, 6, 6,
 27, 27, 27, 30, 30, 30, 33, 33, 33,
 27, 27, 27, 30, 30, 30, 33, 33, 33,
 27, 27, 27, 30, 30, 30, 33, 33, 33,
 54, 54, 54, 57, 57, 57, 60, 60, 60,
 54, 54, 54, 57, 57, 57, 60, 60, 60,
 54, 54, 54, 57, 57, 57, 60, 60, 60
};
static void add_column_indices(unsigned long long indices[2], unsigned char i) {
 indices[0] |= 0x8040201008040201ULL << column_index[i];
 indices[1] |= 0x8040201008040201ULL >> (10-column_index[i]);
}
static void add_row_indices(unsigned long long indices[2], unsigned char i) {
 switch (row_index[i]) {
 case 7:
 indices[0] |= 0x8000000000000000ULL;
 indices[1] |= 0xffULL;
 break;
 case 8:
 indices[1] |= 0x01ff00ULL;
 break;
 default:
 indices[0] |= 0x01ffULL << 9*row_index[i];
 }
}
static void add_box_indices(unsigned long long indices[2], unsigned char i) {
 indices[0] |= 0x1c0e07ULL << box_start[i];
 indices[1] |= 0x0381c0e0ULL >> (60-box_start[i]);
}
struct GridState {
 struct GridState* prev; // last grid state before a guess was made, used for backtracking
 unsigned long long unlocked[2]; // for keeping track of which cells don't need to be looked at anymore. Set bits correspond to cells that still have multiple possibilities
 unsigned long long updated[2]; // for keeping track of which cell's candidates may have been changed since last time we looked for naked sets. Set bits correspond to changed candidates in these cells
 unsigned short candidates[81]; // which digits can go in this cell? Set bits correspond to possible digits
};
static void enter_digit(struct GridState* grid_state, signed char digit, unsigned char i) {
 // lock this cell and and remove this digit from the candidates in this row, column and box
 
 unsigned short* candidates = grid_state->candidates;
 unsigned long long* unlocked = grid_state->unlocked;
 unsigned long long to_update[2] = {0};
 
 if (i < 64) {
 unlocked[0] &= ~(1ULL << i);
 } else {
 unlocked[1] &= ~(1ULL << (i-64));
 }
 
 candidates[i] = 1 << digit;
 
 add_box_indices(to_update, i);
 add_column_indices(to_update, i);
 add_row_indices(to_update, i);
 
 to_update[0] &= unlocked[0];
 to_update[1] &= unlocked[1];
 
 grid_state->updated[0] |= to_update[0];
 grid_state->updated[1] |= to_update[1];
 
 const __m256i bit_mask = _mm256_setr_epi16(1<<0, 1<<1, 1<<2, 1<<3, 1<<4, 1<<5, 1<<6, 1<<7, 1<<8, 1<<9, 1<<10, 1<<11, 1<<12, 1<<13, 1<<14, 1<<15);
 const __m256i mask = _mm256_set1_epi16(~candidates[i]);
 for (unsigned char j = 0; j < 80; j += 16) {
 unsigned short m;
 if (j < 64) {
 m = (unsigned short) ((to_update[0] >> j) & 0xffff);
 } else {
 m = (unsigned short) (to_update[1] & 0xffff);
 } 
 __m256i c = _mm256_loadu_si256((__m256i*) &candidates[j]);
 __m256i u = _mm256_cmpeq_epi16(_mm256_and_si256(bit_mask, _mm256_set1_epi16(m)), _mm256_setzero_si256());
 c = _mm256_and_si256(c, _mm256_or_si256(mask, u));
 _mm256_storeu_si256((__m256i*) &candidates[j], c);
 }
 if ((to_update[1] & (1ULL << (80-64))) != 0) {
 candidates[80] &= ~candidates[i];
 }
}
static long long guesses = 0;
static struct GridState* make_guess(struct GridState* grid_state) {
 // Make a guess for the cell with the least candidates. The guess will be the lowest
 // possible digit for that cell. If multiple cells have the same number of candidates, the 
 // cell with lowest index will be chosen. Also save the current grid state for tracking back
 // in case the guess is wrong. No cell has less than two candidates.
 
 unsigned long long* unlocked = grid_state->unlocked;
 unsigned short* candidates = grid_state->candidates;
 
 // Find the cell with fewest possible candidates
 unsigned long long to_visit;
 unsigned long guess_index = 0;
 unsigned long i_rel;
 unsigned short cnt;
 unsigned short best_cnt = 16;
 
 to_visit = unlocked[0];
 while (_BitScanForward64(&i_rel, to_visit) != 0) {
 to_visit &= ~(1ULL << i_rel);
 cnt = __popcnt16(candidates[i_rel]);
 if (cnt < best_cnt) {
 best_cnt = cnt;
 guess_index = i_rel;
 }
 }
 to_visit = unlocked[1];
 while (_BitScanForward64(&i_rel, to_visit) != 0) {
 to_visit &= ~(1ULL << i_rel);
 cnt = __popcnt16(candidates[i_rel + 64]);
 if (cnt < best_cnt) {
 best_cnt = cnt;
 guess_index = i_rel + 64;
 }
 }
 
 // Find the first candidate in this cell
 unsigned long digit;
 _BitScanReverse(&digit, candidates[guess_index]);
 
 // Create a copy of the state of the grid to make back tracking possible
 struct GridState* new_grid_state = (struct GridState*) malloc(sizeof(struct GridState));
 memcpy(new_grid_state, grid_state, sizeof(struct GridState));
 new_grid_state->prev = grid_state;
 
 // Remove the guessed candidate from the old grid because if we need to get back to the old grid
 // we know the guess was wrong
 grid_state->candidates[guess_index] &= ~(1 << digit);
 if (guess_index < 64) {
 grid_state->updated[0] |= 1ULL << guess_index;
 } else {
 grid_state->updated[1] |= 1ULL << (guess_index-64);
 }
 
 // Update candidates
 enter_digit(new_grid_state, (signed char) digit, (unsigned char) guess_index);
 
 guesses++;
 
 return new_grid_state;
}
static struct GridState* track_back(struct GridState* grid_state) {
 // Go back to the state when the last guess was made
 // This state had the guess removed as candidate from it's cell
 
 struct GridState* old_grid_state = grid_state->prev;
 free(grid_state);
 
 return old_grid_state;
}
static bool solve(signed char grid[81]) {
 struct GridState* grid_state = (struct GridState*) malloc(sizeof(struct GridState));
 grid_state->prev = 0;
 unsigned long long* unlocked = grid_state->unlocked;
 unsigned long long* updated = grid_state->updated;
 unsigned short* candidates = grid_state->candidates;
 unlocked[0] = 0xffffffffffffffffULL;
 unlocked[1] = 0x1ffffULL;
 updated[0] = unlocked[0];
 updated[1] = unlocked[1];
 
 {
 signed char digit;
 unsigned short columns[9] = {0};
 unsigned short rows[9] = {0};
 unsigned short boxes[9] = {0};
 
 for (unsigned char i = 0; i < 64; ++i) {
 digit = grid[i];
 if (digit >= 49) {
 columns[column_index[i]] |= 1 << (digit-49);
 rows[row_index[i]] |= 1 << (digit-49);
 boxes[box_index[i]] |= 1 << (digit-49);
 unlocked[0] &= ~(1ULL << i);
 }
 }
 for (unsigned char i = 64; i < 81; ++i) {
 digit = grid[i];
 if (digit >= 49) {
 columns[column_index[i]] |= 1 << (digit-49);
 rows[row_index[i]] |= 1 << (digit-49);
 boxes[box_index[i]] |= 1 << (digit-49);
 unlocked[1] &= ~(1ULL << (i-64));
 }
 }
 
 for (unsigned char i = 0; i < 81; ++i) {
 if (grid[i] < 49) {
 candidates[i] = 0x01ff ^ (rows[row_index[i]] | columns[column_index[i]] | boxes[box_index[i]]);
 } else {
 candidates[i] = 1 << (grid[i]-49);
 }
 }
 }
 
 start:
 
 unlocked = grid_state->unlocked;
 candidates = grid_state->candidates;
 
 // Find naked singles
 { 
 bool found;
 const __m256i ones = _mm256_set1_epi16(1);
 const __m256i bit_mask = _mm256_setr_epi16(1<<0, 1<<1, 1<<2, 1<<3, 1<<4, 1<<5, 1<<6, 1<<7, 1<<8, 1<<9, 1<<10, 1<<11, 1<<12, 1<<13, 1<<14, 1<<15);
 do {
 found = false;
 for (unsigned char i = 0; i < 80; i += 16) {
 __m256i c = _mm256_loadu_si256((__m256i*) &candidates[i]);
 // Check if any cell has zero candidates
 if (_mm256_movemask_epi8(_mm256_cmpeq_epi16(c, _mm256_setzero_si256()))) {
 // Back track, no solutions along this path
 grid_state = track_back(grid_state);
 goto start;
 } else {
 unsigned short m;
 if (i < 64) {
 m = (unsigned short) ((unlocked[0] >> i) & 0xffff);
 } else {
 m = (unsigned short) (unlocked[1] & 0xffff);
 } 
 
 __m256i a = _mm256_cmpeq_epi16(_mm256_and_si256(c, _mm256_sub_epi16(c, ones)), _mm256_setzero_si256());
 __m256i u = _mm256_cmpeq_epi16(_mm256_and_si256(bit_mask, _mm256_set1_epi16(m)), bit_mask);
 int mask = _mm256_movemask_epi8(_mm256_and_si256(a, u));
 
 if (mask) {
 unsigned long index, digit;
 _BitScanForward(&index, mask);
 index = (index >> 1) + i; 
 _BitScanReverse(&digit, candidates[index]);
 enter_digit(grid_state, (signed char) digit, index);
 found = true;
 }
 }
 }
 if (unlocked[1] & (1ULL << (80-64))) {
 if (candidates[80] == 0) {
 // no solutions go back
 grid_state = track_back(grid_state);
 goto start;
 } else if (__popcnt16(candidates[80]) == 1) {
 // Enter the digit and update candidates
 unsigned long digit;
 _BitScanReverse(&digit, candidates[80]);
 enter_digit(grid_state, (signed char) digit, 80);
 found = true;
 }
 }
 } while (found);
 }
 
 // Check if it's solved, if it ever gets solved it will be solved after looking for naked singles
 if ((unlocked[0] | unlocked[1]) == 0) {
 // Solved it
 // Free memory
 while (grid_state) {
 struct GridState* prev_grid_state = grid_state->prev;
 free(grid_state);
 grid_state = prev_grid_state;
 }
 
 // Enter found digits into grid
 for (unsigned char j = 0; j < 81; ++j) {
 unsigned long index;
 _BitScanReverse(&index, candidates[j]);
 grid[j] = (signed char) index + 49;
 }
 
 return true;
 }
 
 // Find hidden singles
 // Don't check the last column because it doesn't fit in the SSE register so it's not really worth checking
 {
 const __m128i ones = _mm_set1_epi16(1);
 const __m128i bit_mask = _mm_setr_epi16(1<<0, 1<<1, 1<<2, 1<<3, 1<<4, 1<<5, 1<<6, 1<<7);
 const __m128i shuffle_mask = _mm_setr_epi8(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1);
 for (unsigned char i = 0; i < 81; i += 9) {
 
 // rows
 __m128i row_mask = _mm_set1_epi16(0x01ff ^ candidates[i+8]);
 __m128i c = _mm_loadu_si128((__m128i*) &candidates[i]);
 for (unsigned char j = 0; j < 7; ++j) {
 // rotate shift (1 2 3 4) -> (4 1 2 3)
 c = _mm_shuffle_epi8(c, shuffle_mask);
 row_mask = _mm_andnot_si128(c, row_mask);
 }
 
 // columns
 __m128i column_mask = _mm_set1_epi16(0x01ff);
 for (unsigned char j = 0; j < 81; j += 9) {
 if (j != i) {
 column_mask = _mm_andnot_si128(_mm_loadu_si128((__m128i*) &candidates[j]), column_mask);
 }
 }
 
 // boxes aren't worth it
 
 __m128i or_mask = _mm_or_si128(row_mask, column_mask); 
 
 if (_mm_test_all_zeros(or_mask, _mm_sub_epi16(or_mask, ones))) {
 
 unsigned short m;
 if (i < 64) {
 m = (unsigned short) ((unlocked[0] >> i) & 0xff);
 } else {
 m = (unsigned short) ((unlocked[1] >> (i-64)) & 0xff);
 }
 
 c = _mm_loadu_si128((__m128i*) &candidates[i]);
 
 __m128i a = _mm_cmpgt_epi16(_mm_and_si128(c, or_mask), _mm_setzero_si128());
 __m128i u = _mm_cmpeq_epi16(_mm_and_si128(bit_mask, _mm_set1_epi16(m)), bit_mask);
 int mask = _mm_movemask_epi8(_mm_and_si128(a, u));
 
 if (mask) {
 unsigned long index, digit;
 _BitScanForward(&index, mask);
 
 index = index/2;
 
 int can = ((unsigned short*) &or_mask)[index];
 _BitScanForward(&digit, can);
 
 index = index + i;
 
 enter_digit(grid_state, (signed char) digit, index);
 goto start;
 }
 
 } else {
 // no solutions go back
 grid_state = track_back(grid_state);
 goto start;
 }
 }
 }
 
 // Find naked sets, up to 5
 {
 bool found = false;
 // because this is kind of an expensive task, we are not going to visit all cells but only those that were changed
 unsigned long long *to_visit_n = grid_state->updated;
 to_visit_n[0] &= unlocked[0];
 to_visit_n[1] &= unlocked[1];
 
 for (unsigned char n = 0; n < 2; ++n) {
 while (to_visit_n[n]) {
 unsigned long i_rel;
 _BitScanForward64(&i_rel, to_visit_n[n]);
 
 to_visit_n[n] ^= 1ULL << i_rel;
 unsigned char i = (unsigned char) i_rel + 64*n;
 
 unsigned short cnt = __popcnt16(candidates[i]);
 
 if (cnt <= 5) {
 // check column
 unsigned long long to_change[2] = {0};
 unsigned char s;
 __m128i a_i = _mm_set1_epi16(candidates[i]);
 __m128i a_j = _mm_set_epi16(candidates[column_index[i]+63], candidates[column_index[i]+54], candidates[column_index[i]+45], candidates[column_index[i]+36], candidates[column_index[i]+27], candidates[column_index[i]+18], candidates[column_index[i]+9], candidates[column_index[i]]);
 __m128i res = _mm_cmpeq_epi16(a_i, _mm_or_si128(a_i, a_j));
 s = __popcnt16(_mm_movemask_epi8(res)) >> 1;
 s += candidates[i] == (candidates[i] | candidates[column_index[i]+72]);
 if (s > cnt) {
 grid_state = track_back(grid_state);
 goto start;
 } else if (s == cnt) {
 add_column_indices(to_change, i);
 }
 // check row
 a_j = _mm_load_si128((__m128i*) &candidates[9*row_index[i]]);
 res = _mm_cmpeq_epi16(a_i, _mm_or_si128(a_i, a_j));
 s = __popcnt16(_mm_movemask_epi8(res)) >> 1;
 s += candidates[i] == (candidates[i] | candidates[9*row_index[i]+8]);
 if (s > cnt) {
 grid_state = track_back(grid_state);
 goto start;
 } else if (s == cnt) {
 add_row_indices(to_change, i);
 }
 
 // check box
 unsigned short b = box_start[i];
 a_j = _mm_set_epi16(candidates[b], candidates[b+1], candidates[b+2], candidates[b+9], candidates[b+10], candidates[b+11], candidates[b+18], candidates[b+19]);
 res = _mm_cmpeq_epi16(a_i, _mm_or_si128(a_i, a_j));
 s = __popcnt16(_mm_movemask_epi8(res)) >> 1;
 s += candidates[i] == (candidates[i] | candidates[b+20]);
 if (s > cnt) {
 grid_state = track_back(grid_state);
 goto start;
 } else if (s == cnt) {
 add_box_indices(to_change, i);
 }
 
 to_change[0] &= unlocked[0];
 to_change[1] &= unlocked[1];
 
 // update candidates
 for (unsigned char n = 0; n < 2; ++n) {
 while (to_change[n]) {
 unsigned long j_rel;
 _BitScanForward64(&j_rel, to_change[n]);
 to_change[n] &= ~(1ULL << j_rel);
 unsigned char j = (unsigned char) j_rel + 64*n;
 
 if ((candidates[j] | candidates[i]) != candidates[i]) {
 if (candidates[j] & candidates[i]) {
 candidates[j] &= ~candidates[i];
 to_visit_n[n] |= 1ULL << j_rel;
 found = true;
 }
 }
 }
 }
 
 // If any cell's candidates got updated, go back and try all that other stuff again
 if (found) {
 goto start;
 }
 
 }
 }
 }
 }
 
 // More techniques could be added here but they're not really worth checking for on the 17 clue sudoku set
 
 // Make a guess if all that didn't work
 grid_state = make_guess(grid_state);
 goto start;
 
}
int main(int argc, char *argv[]) {
 
 if (argc != 2) {
 printf("Error! Pass the file name as argument\n");
 return -1;
 }
 
 FILE *f = fopen(argv[1], "rb");
 
 // test for files not existing
 if (f == 0) {
 printf("Error! Could not open file %s\n", argv[1]);
 return -1;
 }
 
 // find length of file
 fseek(f, 0, SEEK_END);
 long fsize = ftell(f);
 fseek(f, 0, SEEK_SET);
 
 // deal with the part that says how many sudokus there are
 signed char *string = malloc(fsize + 1);
 fread(string, 1, fsize, f);
 fclose(f);
 size_t p = 0;
 while (string[p] != 10) {
 ++p;
 }
 ++p;
 
 signed char *output = malloc(fsize*2 - p + 2);
 memcpy(output, string, p);
 
 // solve all sudokus and prepare output file
 int i;
 #pragma omp parallel for shared(string, output, fsize, p) schedule(dynamic)
 for (i = fsize - p - 81; i >= 0; i-=82) {
 // copy unsolved grid
 memcpy(&output[p+i*2], &string[p+i], 81);
 memcpy(&output[p+i*2+82], &string[p+i], 81);
 // add comma and newline in right place
 output[p+i*2 + 81] = ',';
 output[p+i*2 + 163] = 10;
 // solve the grid in place
 solve(&output[p+i*2+82]);
 }
 
 // create output file
 f = fopen("output.txt", "wb");
 fwrite(output, 1, fsize*2 - p + 2, f);
 fclose(f); 
 free(string);
 free(output);
 return 0;
}

The algorithm uses normal techniques humans use. Specifically looking for naked singles, hidden singles and naked subsets. If that doesn't work it uses the infamous Nishio technique (just backtracking). I added more techniques initially like looking for locked candidates, but they turned out to not really be worth it for these 17-clue sudokus.

Question 13

very cool!! I recently did some benchmarks and it loooks like std::copy usually is a fraction faster than memcpy

Question 14

@JohanduToit The memcpy part is not really critical since the algorithm makes on average only 1 guess per sudoku (it doesn't get called often). When looking for locked candidates this goes down to only 0.3 or 0.2 guesses but my method for finding them was kinda bad so it didn't make much of a difference in time. I think if you want to make it significantly faster that would be something to look into though.

Question 15

Wow, this is really impressive! I noticed that you're using omp.h to solve all of the puzzles in parallel. It was a long time since I hosted this challenge, so I can't remember the exact verdict on parallelization, but your solution would be either the fastest or second fastest running single-threaded. I'll try to get this benchmarked to get you an official score!

Question 16

Very nice Mirage! I ran this through tdoku's benchmarks, and it is actually the fastest solver I've seen whose primary representation is cell candidate sets. It's right in there with other top solvers in single-threaded performance, and it does the least guessing of any solver except tdoku. Here is the comparison on various data sets: pastebin.com/ysXKWXJQ I haven't benchmarked with openmp since all of these solvers will benefit in roughly the same way. Is this on github somewhere? It is an interesting specimen ... may I add it to tdoku's "other solver" benchmarks?

Question 17

Sorry, looks like I didn't set up the comparison correctly in the benchmarks above. Your solver returns when it finds the first solution and doesn't confirm that the solution is unique (something all the other solvers were doing). When you run them all in "first solution" mode then the results are here: pastebin.com/JTmkp8Y7 Still a solid performer, and maybe with headroom, but not the outlier it seemed to be in either speed and guesses. Instead it's comparable to other fast solvers with a cell-based representation. A little faster on some datasets, a little slower on others.

Question 18

Node.js, (削除) 8.231s (削除ここまで) 6.735s official score

Takes the file name as argument. The input file may already contain the solutions in the format described in the challenge, in which case the program will compare them with its own solutions.

The results are saved in 'sudoku.log'.

Code

'use strict';
const fs = require('fs');
const BLOCK = [];
const BLOCK_NDX = [];
const N_BIT = [];
const ZERO = [];
const BIT = [];
console.time('Processing time');
init();
let filename = process.argv[2],
 puzzle = fs.readFileSync(filename).toString().split('\n'),
 len = puzzle.shift(),
 output = len + '\n';
console.log("File '" + filename + "': " + len + " puzzles");
// solve all puzzles
puzzle.forEach((p, i) => {
 let sol, res;
 [ p, sol ] = p.split(',');
 if(p.length == 81) {
 if(!(++i % 2000)) {
 console.log((i * 100 / len).toFixed(1) + '%');
 }
 if(!(res = solve(p))) {
 throw "Failed on puzzle " + i;
 }
 if(sol && res != sol) {
 throw "Invalid solution for puzzle " + i;
 }
 output += p + ',' + res + '\n';
 }
});
// results
console.timeEnd('Processing time');
fs.writeFileSync('sudoku.log', output);
console.log("MD5 = " + require('crypto').createHash('md5').update(output).digest("hex"));
// initialization of lookup tables
function init() {
 let ptr, x, y;
 for(x = 0; x < 0x200; x++) {
 N_BIT[x] = [0, 1, 2, 3, 4, 5, 6, 7, 8].reduce((s, n) => s + (x >> n & 1), 0);
 ZERO[x] = ~x & -~x;
 }
 for(x = 0; x < 9; x++) {
 BIT[1 << x] = x;
 }
 for(ptr = y = 0; y < 9; y++) {
 for(x = 0; x < 9; x++, ptr++) {
 BLOCK[ptr] = (y / 3 | 0) * 3 + (x / 3 | 0);
 BLOCK_NDX[ptr] = (y % 3) * 3 + x % 3;
 }
 }
}
// solver
function solve(p) {
 let ptr, x, y, v,
 count = 81,
 m = Array(81).fill(-1),
 row = Array(9).fill(0),
 col = Array(9).fill(0),
 blk = Array(9).fill(0);
 // helper function to check and play a move
 function play(stack, x, y, n) {
 let p = y * 9 + x;
 if(~m[p]) {
 if(m[p] == n) {
 return true;
 }
 undo(stack);
 return false;
 }
 let msk, b;
 msk = 1 << n;
 b = BLOCK[p];
 if((col[x] | row[y] | blk[b]) & msk) {
 undo(stack);
 return false;
 }
 count--;
 col[x] ^= msk;
 row[y] ^= msk;
 blk[b] ^= msk;
 m[p] = n;
 stack.push(x << 8 | y << 4 | n);
 return true;
 }
 // helper function to undo all moves on the stack
 function undo(stack) {
 stack.forEach(v => {
 let x = v >> 8,
 y = v >> 4 & 15,
 p = y * 9 + x,
 b = BLOCK[p];
 v = 1 << (v & 15);
 count++;
 col[x] ^= v;
 row[y] ^= v;
 blk[b] ^= v;
 m[p] = -1;
 });
 }
 // convert the puzzle into our own format
 for(ptr = y = 0; y < 9; y++) {
 for(x = 0; x < 9; x++, ptr++) {
 if(~(v = p[ptr] - 1)) {
 col[x] |= 1 << v;
 row[y] |= 1 << v;
 blk[BLOCK[ptr]] |= 1 << v;
 count--;
 m[ptr] = v;
 }
 }
 }
 // main recursive search function
 let res = (function search() {
 // success?
 if(!count) {
 return true;
 }
 let ptr, x, y, v, n, max, best,
 k, i, stack = [],
 dCol = Array(81).fill(0),
 dRow = Array(81).fill(0),
 dBlk = Array(81).fill(0),
 b, v0;
 // scan the grid:
 // - keeping track of where each digit can go on a given column, row or block
 // - looking for a cell with the fewest number of legal moves
 for(max = ptr = y = 0; y < 9; y++) {
 for(x = 0; x < 9; x++, ptr++) {
 if(m[ptr] == -1) {
 v = col[x] | row[y] | blk[BLOCK[ptr]];
 n = N_BIT[v];
 // abort if there's no legal move on this cell
 if(n == 9) {
 return false;
 }
 // update dCol[], dRow[] and dBlk[]
 for(v0 = v ^ 0x1FF; v0;) {
 b = v0 & -v0;
 dCol[x * 9 + BIT[b]] |= 1 << y;
 dRow[y * 9 + BIT[b]] |= 1 << x;
 dBlk[BLOCK[ptr] * 9 + BIT[b]] |= 1 << BLOCK_NDX[ptr];
 v0 ^= b;
 }
 // update the cell with the fewest number of moves
 if(n > max) {
 best = {
 x : x,
 y : y,
 ptr: ptr,
 msk: v
 };
 max = n;
 }
 }
 }
 }
 // play all forced moves (unique candidates on a given column, row or block)
 // and make sure that it doesn't lead to any inconsistency
 for(k = 0; k < 9; k++) {
 for(n = 0; n < 9; n++) {
 if(N_BIT[dCol[k * 9 + n]] == 1) {
 i = BIT[dCol[k * 9 + n]];
 if(!play(stack, k, i, n)) {
 return false;
 }
 }
 if(N_BIT[dRow[k * 9 + n]] == 1) {
 i = BIT[dRow[k * 9 + n]];
 if(!play(stack, i, k, n)) {
 return false;
 }
 }
 if(N_BIT[dBlk[k * 9 + n]] == 1) {
 i = BIT[dBlk[k * 9 + n]];
 if(!play(stack, (k % 3) * 3 + i % 3, (k / 3 | 0) * 3 + (i / 3 | 0), n)) {
 return false;
 }
 }
 }
 }
 // if we've played at least one forced move, do a recursive call right away
 if(stack.length) {
 if(search()) {
 return true;
 }
 undo(stack);
 return false;
 }
 // otherwise, try all moves on the cell with the fewest number of moves
 while((v = ZERO[best.msk]) < 0x200) {
 col[best.x] ^= v;
 row[best.y] ^= v;
 blk[BLOCK[best.ptr]] ^= v;
 m[best.ptr] = BIT[v];
 count--;
 if(search()) {
 return true;
 }
 count++;
 m[best.ptr] = -1;
 col[best.x] ^= v;
 row[best.y] ^= v;
 blk[BLOCK[best.ptr]] ^= v;
 best.msk ^= v;
 }
 return false;
 })();
 return res ? m.map(n => n + 1).join('') : false;
}
// debugging
function dump(m) {
 let x, y, c = 81, s = '';
 for(y = 0; y < 9; y++) {
 for(x = 0; x < 9; x++) {
 s += (~m[y * 9 + x] ? (c--, m[y * 9 + x] + 1) : '-') + (x % 3 < 2 || x == 8 ? ' ' : ' | ');
 }
 s += y % 3 < 2 || y == 8 ? '\n' : '\n------+-------+------\n';
 }
 console.log(c);
 console.log(s);
}

Example output

Tested on an Intel Core i7 7500U @ 2.70 GHz.

output

Question 19

@maxb I see. I did not understand that your concern was about multi-threading.

Question 20

it should be faster if you delete the line if(!stack.some(... and its corresponding } (but preserve what's between them), and at the beginning of play() insert if(m[y][x]>=0)return v==1<<m[y][x]

Question 21

Why is your solution so much faster than the other ones?

Question 22

@Anush What I've called 'forced moves' in the code is what makes it significantly faster and is better known as hidden singles. (We could also look for hidden twins, triples, quads, etc. but I'm not sure it's really worth it, at least in Node.)

Question 23

"I started looking at naked singles" careful with the phrasing :)

Question 24

Python 3 (with dlx) 4min 46.870s official score

(single core i7-3610QM here)

Obviously beatable with a compiled language like C, and making use of threading, but it's a start...

sudoku is a module I've placed on github (copied at the footer of this post) which uses dlx under the hood.

#!/usr/bin/python
import argparse
import gc
import sys
from timeit import timeit
from sudoku import Solver
def getSolvers(filePath):
 solvers = []
 with open(filePath, 'r') as inFile:
 for line in inFile:
 content = line.rstrip()
 if len(content) == 81 and content.isdigit():
 solvers.append(Solver(content))
 return solvers
def solve(solvers):
 for solver in solvers:
 yield next(solver.genSolutions())
if __name__ == '__main__':
 parser = argparse.ArgumentParser(description='Time or print solving of some sudoku.')
 parser.add_argument('filePath',
 help='Path to the file containing proper sudoku on their own lines as 81 digits in row-major order with 0s as blanks')
 parser.add_argument('-p', '--print', dest='printEm', action='store_true',
 default=False,
 help='print solutions in the same fashion as the input')
 parser.add_argument('-P', '--pretty', dest='prettyPrintEm', action='store_true',
 default=False,
 help='print inputs and solutions formatted for human consumption')
 args = parser.parse_args()
 if args.printEm or args.prettyPrintEm:
 solvers = getSolvers(args.filePath)
 print(len(solvers))
 for solver, solution in zip(solvers, solve(solvers)):
 if args.prettyPrintEm:
 print(solver)
 print(solution)
 else:
 print('{},{}'.format(solver.representation(noneCharacter='0'), solution.representation()))
 else:
 setup = '''\
from __main__ import getSolvers, solve, args, gc
gc.disable()
solvers = getSolvers(args.filePath)'''
 print(timeit("for solution in solve(solvers): pass", setup=setup, number=1))

Usage

Install Python 3
Save sudoku.py somewhere on your path (from the git hub link or copy it from below)
Save the above code as testSolver.py somewhere on your path
Install dlx:

python -m pip install dlx

Run it (by the way it consumes memory like it's going out of fashion)

usage: testSolver.py [-h] [-p] [-P] filePath
Time or print solving of some sudoku.
positional arguments:
 filePath Path to the file containing proper sudoku on their own lines
 as 81 digits in row-major order with 0s as blanks
optional arguments:
 -h, --help show this help message and exit
 -p, --print print solutions in the same fashion as the input
 -P, --pretty print inputs and solutions formatted for human consumption

Pipe output as required in the challenge spec to a file if need be:

python testSolver.py -p input_file_path> output_file_path

sudoku.py (yes there are extra features here other than solving)

import dlx
from itertools import permutations, takewhile
from random import choice, shuffle
'''
A 9 by 9 sudoku solver.
'''
_N = 3
_NSQ = _N**2
_NQU = _N**4
_VALID_VALUE_INTS = list(range(1, _NSQ + 1))
_VALID_VALUE_STRS = [str(v) for v in _VALID_VALUE_INTS]
_EMPTY_CELL_CHAR = '·'
# The following are mutually related by their ordering, and define ordering throughout the rest of the code. Here be dragons.
#
_CANDIDATES = [(r, c, v) for r in range(_NSQ) for c in range(_NSQ) for v in range(1, _NSQ + 1)]
_CONSTRAINT_INDEXES_FROM_CANDIDATE = lambda r, c, v: [ _NSQ * r + c, _NQU + _NSQ * r + v - 1, _NQU * 2 + _NSQ * c + v - 1, _NQU * 3 + _NSQ * (_N * (r // _N) + c // _N) + v - 1]
_CONSTRAINT_FORMATTERS = [ "R{0}C{1}" , "R{0}#{1}" , "C{0}#{1}" , "B{0}#{1}"]
_CONSTRAINT_NAMES = [(s.format(a, b + (e and 1)), dlx.DLX.PRIMARY) for e, s in enumerate(_CONSTRAINT_FORMATTERS) for a in range(_NSQ) for b in range(_NSQ)]
_EMPTY_GRID_CONSTRAINT_INDEXES = [_CONSTRAINT_INDEXES_FROM_CANDIDATE(r, c, v) for (r, c, v) in _CANDIDATES]
#
# The above are mutually related by their ordering, and define ordering throughout the rest of the code. Here be dragons.
class Solver:
 def __init__(self, representation=''):
 if not representation or len(representation) != _NQU:
 self._complete = False
 self._NClues = 0
 self._repr = [None]*_NQU # blank grid, no clues - maybe to extend to a generator by overriding the DLX column selection to be stochastic.
 else:
 nClues = 0
 repr = []
 for value in representation:
 if not value:
 repr.append(None)
 elif isinstance(value, int) and 1 <= value <= _NSQ:
 nClues += 1
 repr.append(value)
 elif value in _VALID_VALUE_STRS:
 nClues += 1
 repr.append(int(value))
 else:
 repr.append(None)
 self._complete = nClues == _NQU
 self._NClues = nClues
 self._repr = repr
 def genSolutions(self, genSudoku=True, genNone=False, dlxColumnSelctor=None):
 '''
 if genSudoku=False, generates each solution as a list of cell values (left-right, top-bottom)
 '''
 if self._complete:
 yield self
 else:
 self._initDlx()
 dlxColumnSelctor = dlxColumnSelctor or dlx.DLX.smallestColumnSelector
 if genSudoku:
 for solution in self._dlx.solve(dlxColumnSelctor):
 yield Solver([v for (r, c, v) in sorted([self._dlx.N[i] for i in solution])])
 elif genNone:
 for solution in self._dlx.solve(dlxColumnSelctor):
 yield
 else:
 for solution in self._dlx.solve(dlxColumnSelctor):
 yield [v for (r, c, v) in sorted([self._dlx.N[i] for i in solution])]
 def uniqueness(self, returnSolutionIfProper=False):
 '''
 Returns: 0 if unsolvable;
 1 (or the unique solution if returnSolutionIfProper=True) if uniquely solvable; or
 2 if multiple possible solutions exist
 - a 'proper' sudoku is uniquely solvable.
 '''
 slns = list(takewhile(lambda t: t[0] < 2, ((i, sln) for i, sln in enumerate(self.genSolutions(genSudoku=returnSolutionIfProper, genNone=not returnSolutionIfProper)))))
 uniqueness = len(slns)
 if returnSolutionIfProper and uniqueness == 1:
 return slns[0][1]
 else:
 return uniqueness
 def representation(self, asString=True, noneCharacter='.'):
 if asString:
 return ''.join([v and str(_VALID_VALUE_STRS[v - 1]) or noneCharacter for v in self._repr])
 return self._repr[:]
 def __repr__(self):
 return display(self._repr)
 def _initDlx(self):
 self._dlx = dlx.DLX(_CONSTRAINT_NAMES)
 rowIndexes = self._dlx.appendRows(_EMPTY_GRID_CONSTRAINT_INDEXES, _CANDIDATES)
 for r in range(_NSQ):
 for c in range(_NSQ):
 v = self._repr[_NSQ * r + c]
 if v is not None:
 self._dlx.useRow(rowIndexes[_NQU * r + _NSQ * c + v - 1])
_ROW_SEPARATOR_COMPACT = '+'.join(['-' * (2 * _N + 1) for b in range(_N)])[1:-1] + '\n'
_ROW_SEPARATOR = ' ·-' + _ROW_SEPARATOR_COMPACT[:-1] + '-·\n'
_TOP_AND_BOTTOM = _ROW_SEPARATOR.replace('+', '·')
_ROW_LABELS = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J']
_COL_LABELS = ['1', '2', '3', '4', '5', '6', '7', '8', '9']
_COLS_LABEL = ' ' + ' '.join([i % _N == 0 and ' ' + l or l for i, l in enumerate(_COL_LABELS)]) + '\n'
def display(representation, conversion=None, labelled=True):
 result = ''
 raw = [conversion[n or 0] for n in representation] if conversion else representation
 if labelled:
 result += _COLS_LABEL + _TOP_AND_BOTTOM
 rSep = _ROW_SEPARATOR
 else:
 rSep = _ROW_SEPARATOR_COMPACT
 for r in range(_NSQ):
 if r > 0 and r % _N == 0:
 result += rSep
 for c in range(_NSQ):
 if c % _N == 0:
 if c == 0:
 if labelled:
 result += _ROW_LABELS[r] + '| '
 else:
 result += '| '
 result += str(raw[_NSQ * r + c] or _EMPTY_CELL_CHAR) + ' '
 if labelled:
 result += '|'
 result += '\n'
 if labelled:
 result += _TOP_AND_BOTTOM
 else:
 result = result[:-1]
 return result
def permute(representation):
 '''
 returns a random representation from the given representation's equivalence class
 '''
 rows = [list(representation[i:i+_NSQ]) for i in range(0, _NQU, _NSQ)]
 rows = permuteRowsAndBands(rows)
 rows = [[r[i] for r in rows] for i in range(_NSQ)]
 rows = permuteRowsAndBands(rows)
 pNumbers = [str(i) for i in range(1, _NSQ + 1)]
 shuffle(pNumbers)
 return ''.join(''.join([pNumbers[int(v) - 1] if v.isdigit() and v != '0' else v for v in r]) for r in rows)
def permuteRowsAndBands(rows):
 bandP = choice([x for x in permutations(range(_N))])
 rows = [rows[_N * b + r] for b in bandP for r in range(_N)]
 for band in range(0, _NSQ, _N):
 rowP = choice([x for x in permutations([band + i for i in range(_N)])])
 rows = [rows[rowP[i % _N]] if i // _N == band else rows[i] for i in range(_NSQ)]
 return rows
def getRandomSolvedStateRepresentation():
 return permute('126459783453786129789123456897231564231564897564897231312645978645978312978312645')
def getRandomSudoku():
 r = getRandomSolvedStateRepresentation()
 s = Solver(r)
 indices = list(range(len(r)))
 shuffle(indices)
 for i in indices:
 ns = Solver(s._repr[:i] + [None] + s._repr[i+1:])
 if ns.uniqueness() == 1:
 s = ns
 return s
if __name__ == '__main__':
 print('Some example useage:')
 inputRepresentation = '..3......4......2..8.12...6.........2...6...7...8.7.31.1.64.9..6.5..8...9.83...4.'
 print('>>> s = Solver({})'.format(inputRepresentation))
 s = Solver(inputRepresentation)
 print('>>> s')
 print(s)
 print('>>> print(s.representation())')
 print(s.representation())
 print('>>> print(display(s.representation(), labelled=False))')
 print(display(s.representation(), labelled=False))
 print('>>> for solution in s.genSolutions(): solution')
 for solution in s.genSolutions(): print(solution)
 inputRepresentation2 = inputRepresentation[:2] + '.' + inputRepresentation[3:]
 print('>>> s.uniqueness()')
 print(s.uniqueness())
 print('>>> s2 = Solver({}) # removed a clue; this has six solutions rather than one'.format(inputRepresentation2))
 s2 = Solver(inputRepresentation2)
 print('>>> s2.uniqueness()')
 print(s2.uniqueness())
 print('>>> for solution in s2.genSolutions(): solution')
 for solution in s2.genSolutions(): print(solution)
 print('>>> s3 = getRandomSudoku()')
 s3 = getRandomSudoku()
 print('>>> s3')
 print(s3)
 print('>>> for solution in s3.genSolutions(): solution')
 for solution in s3.genSolutions(): print(solution)

Question 25

Impressive for a Python solution, I'll try to benchmark it later today.

Question 26

Thanks; and wow, so much faster there maxb!

Question 27

+1 for using dancing links

Question 28

Python 3 + Z3 - 10min 45.657s official score

about 1000s on my laptop.

import time
start = time.time()
import z3.z3 as z3
import itertools
import datetime
import sys
solver = z3.Solver()
ceils = [[None] * 9 for i in range(9)]
for row in range(9):
 for col in range(9):
 name = 'c' + str(row * 9 + col)
 ceil = z3.BitVec(name, 9)
 solver.add(z3.Or(
 ceil == 0b000000001,
 ceil == 0b000000010,
 ceil == 0b000000100,
 ceil == 0b000001000,
 ceil == 0b000010000,
 ceil == 0b000100000,
 ceil == 0b001000000,
 ceil == 0b010000000,
 ceil == 0b100000000
 ))
 solver.add(ceil != 0)
 ceils[row][col] = ceil
for i in range(9):
 for j in range(9):
 for k in range(9):
 if j == k: continue
 solver.add(ceils[i][j] & ceils[i][k] == 0)
 solver.add(ceils[j][i] & ceils[k][i] == 0)
 row, col = i // 3 * 3, i % 3 * 3
 solver.add(ceils[row + j // 3][col + j % 3] & ceils[row + k // 3][col + k % 3] == 0)
row_col = list(itertools.product(range(9), range(9)))
lookup = { 1 << i: str(i + 1) for i in range(9) }
def solve(line):
 global solver, output, row_col, ceils, lookup
 solver.push()
 for value, (row, col) in zip(line, row_col):
 val = ord(value) - 48
 if val == 0: continue
 solver.add(ceils[row][col] == 1 << (val - 1))
 output = []
 if solver.check() == z3.sat:
 model = solver.model()
 for row in range(9):
 for col in range(9):
 val = model[ceils[row][col]].as_long()
 output.append(lookup[val])
 solver.pop()
 return ''.join(output)
count = int(input())
print(count)
for i in range(count):
 if i % 1000 == 0:
 sys.stderr.write(str(i) + '\n')
 line = input()
 print(line + "," + solve(line))
end = time.time()
sys.stderr.write(str(end - start))

Install dependency

pip install z3-solver

Run

python3 solve.py < in.txt> out.txt

I'm not sure how to improve its performance, since it just solved magically...

Question 29

Quite impressive for a general constraint solver. My first implementation was way slower than this. Running a benchmark right now, I'll update the post once it's done.

Question 30

@maxb just done some general clean up, and i believe there is no need to update the benchmark...

Question 31

C - (削除) 2.228s (削除ここまで) 1.690s official score

based on @Arnauld's

#include<fcntl.h>
#define O const
#define R return
#define S static
#define $(x,y...)if(x){y;}
#define W(x,y...)while(x){y;}
#define fi(x,y...)for(I i=0,_n=(x);i<_n;i++){y;}
#define fj(x,y...)for(I j=0,_n=(x);j<_n;j++){y;}
#define fp81(x...)for(I p=0;p<81;p++){x;}
#define fq3(x...)for(I q=0;q<3;q++){x;}
#define fij9(x...){fi(9,fj(9,x))}
#define m0(x)m0_((V*)(x),sizeof(x));
#define popc(x)__builtin_popcount(x)
#define ctz(x)__builtin_ctz(x)
#include<sys/syscall.h>
#define sc(f,x...)({L u;asm volatile("syscall":"=a"(u):"0"(SYS_##f)x:"cc","rcx","r11","memory");u;})
#define sc1(f,x) sc(f,,"D"(x))
#define sc2(f,x,y) sc(f,,"D"(x),"S"(y))
#define sc3(f,x,y,z)sc(f,,"D"(x),"S"(y),"d"(z))
#define wr(a...)sc3(write,a)
#define op(a...)sc3( open,a)
#define cl(a...)sc1(close,a)
#define fs(a...)sc2(fstat,a)
#define ex(a...)sc1( exit,a)
#define mm(x,y,z,t,u,v)({register L r10 asm("r10")=t,r8 asm("r8")=u,r9 asm("r9")=v;\
 (V*)sc(mmap,,"D"(x),"S"(y),"d"(z),"r"(r10),"r"(r8),"r"(r9));})
typedef void V;typedef char C;typedef short H;typedef int I;typedef long long L;
S C BL[81],KL[81],IJK[81][3],m[81],t_[81-17],*t;S H rcb[3][9],cnt;
S V*mc(V*x,O V*y,L n){C*p=x;O C*q=y;fi(n,*p++=*q++)R x;}S V m0_(C*p,L n){fi(n,*p++=0);}
S I undo(C*t0){cnt+=t-t0;W(t>t0,C p=*--t;H v=1<<m[p];fq3(rcb[q][IJK[p][q]]^=v)m[p]=-1)R 0;}
S I play(C p,H v){$(m[p]>=0,R 1<<m[p]==v)I w=0;fq3(w|=rcb[q][IJK[p][q]])$(w&v,R 0)cnt--;
 fq3(rcb[q][IJK[p][q]]^=v);m[p]=ctz(v);*t++=p;R 1;}
S I f(){$(!cnt,R 1)C*t0=t;H max=0,bp,bv,d[9][9][4];m0(d);
 fij9(I p=i*9+j;$(m[p]<0,
 I v=0;fq3(v|=rcb[q][IJK[p][q]])I w=v^511;$(!w,R 0)H g[]={1<<j,1<<i,1<<BL[p]};
 do{I z=ctz(w);w&=w-1;fq3(d[IJK[p][q]][z][q]|=g[q]);}while(w);
 I n=popc(v);$(max<n,max=n;bp=p;bv=v)))
 fij9(I u=d[i][j][0];$(popc(u)==1,I l=ctz(u);$(!play( i*9+l ,1<<j),R undo(t0)))
 u=d[i][j][1];$(popc(u)==1,I l=ctz(u);$(!play( l*9+i ,1<<j),R undo(t0)))
 u=d[i][j][2];$(popc(u)==1,I l=ctz(u);$(!play(KL[i*9+l],1<<j),R undo(t0))))
 $(t-t0,R f()||undo(t0))
 W(1,I v=1<<ctz(~bv);$(v>511,R 0)fq3(rcb[q][IJK[bp][q]]^=v)m[bp]=ctz(v);cnt--;$(f(),R 1)
 cnt++;m[bp]=-1;fq3(rcb[q][IJK[bp][q]]^=v)bv^=v)
 R 0;}
asm(".globl _start\n_start:pop %rdi\nmov %rsp,%rsi\njmp main");
V main(I ac,C**av){$(ac!=2,ex(2))
 fij9(I p=i*9+j;BL[p]=i%3*3+j%3;KL[p]=(i/3*3+j/3)*9+BL[p];IJK[p][0]=i;IJK[p][1]=j;IJK[p][2]=i/3*3+j/3)
 I d=op(av[1],0,0);struct stat h;fs(d,&h);C*s0=mm(0,h.st_size,1,0x8002,d,0),*s=s0;cl(d); //in
 C*r0=mm(0,2*h.st_size,3,0x22,-1,0),*r=r0; //out
 I n=0;W(*s!='\n',n*=10;n+=*s++-'0')s++;mc(r,s0,s-s0);r+=s-s0;
 fi(n,m0(rcb);cnt=81;t=t_;$(s[81]&&s[81]!='\n',ex(3))mc(r,s,81);r+=81;*r++=',';
 fp81(I v=m[p]=*s++-'1';$(v>=0,v=1<<v;fq3(rcb[q][IJK[p][q]]|=v)cnt--))
 s++;$(!f(),ex(4))fp81(r[p]=m[p]+'1')r+=81;*r++='\n')
 wr(1,r0,r-r0);ex(0);}

compile and run:

gcc -O3 -march=native -nostdlib -ffreestanding
time ./a.out all_17_clue_sudokus.txt | md5sum

Question 32

Congratulations, you (and Arnauld) are in the lead by a lot right now.

Question 33

@maxb i tried using more efficient i/o (direct syscalls without libc) but the effect wasn't as great as i hoped. i also tidied up the rest of the code. this should take away ~0.2s. do you mind re-scoring?

Question 34

Of course, I'll try to get it done sometime today

Question 35

I was also thinking about trying a RAMdisk for all I/O, just to see if that makes a difference. I doubt it'll make a huge difference, since reads and writes are sequential, and my SSD has a large enough cache to fit everything.

Question 36

@maxb there probably won't be any difference at all. the second time you run the program, the input file will already be in ram anyway - in linux's filesystem cache.

Question 37

C++ with Minisat(2.2.1-5) - 11.735s official score

This is nowhere near as fast as a specialized algorithm, but it's a different approach, an interesting point of reference, and easy to understand.

$ clang++ -o solve -lminisat solver_minisat.cc

#include <minisat/core/Solver.h>
namespace {
using Minisat::Lit;
using Minisat::mkLit;
using namespace std;
struct SolverMiniSat {
 Minisat::Solver solver;
 SolverMiniSat() {
 InitializeVariables();
 InitializeTriadDefinitions();
 InitializeTriadOnnes();
 InitializeCellOnnes();
 }
 // normal cell literals, of which we have 9*9*9
 static Lit Literal(int row, int column, int value) {
 return mkLit(value + 9 * (column + 9 * row), true);
 }
 // horizontal triad literals, of which we have 9*3*9, starting after the cell literals
 static Lit HTriadLiteral(int row, int column, int value) {
 int base = 81 * 9;
 return mkLit(base + value + 9 * (column + 3 * row));
 }
 // vertical triad literals, of which we have 3*9*9, starting after the h_triad literals
 static Lit VTriadLiteral(int row, int column, int value) {
 int base = (81 + 27) * 9;
 return mkLit(base + value + 9 * (row + 3 * column));
 }
 void InitializeVariables() {
 for (int i = 0; i < 15 * 9 * 9; i++) {
 solver.newVar();
 }
 }
 // create an exactly-one constraint over a set of literals
 void CreateOnne(const Minisat::vec<Minisat::Lit> &literals) {
 solver.addClause(literals);
 for (int i = 0; i < literals.size() - 1; i++) {
 for (int j = i + 1; j < literals.size(); j++) {
 solver.addClause(~literals[i], ~literals[j]);
 }
 }
 }
 void InitializeTriadDefinitions() {
 for (int i = 0; i < 9; i++) {
 for (int j = 0; j < 3; j++) {
 for (int value = 0; value < 9; value++) {
 Lit h_triad = HTriadLiteral(i, j, value);
 Lit v_triad = VTriadLiteral(j, i, value);
 int j0 = j * 3 + 0, j1 = j * 3 + 1, j2 = j * 3 + 2;
 Minisat::vec<Minisat::Lit> h_triad_def;
 h_triad_def.push(Literal(i, j0, value));
 h_triad_def.push(Literal(i, j1, value));
 h_triad_def.push(Literal(i, j2, value));
 h_triad_def.push(~h_triad);
 CreateOnne(h_triad_def);
 Minisat::vec<Minisat::Lit> v_triad_def;
 v_triad_def.push(Literal(j0, i, value));
 v_triad_def.push(Literal(j1, i, value));
 v_triad_def.push(Literal(j2, i, value));
 v_triad_def.push(~v_triad);
 CreateOnne(v_triad_def);
 }
 }
 }
 }
 void InitializeTriadOnnes() {
 for (int i = 0; i < 9; i++) {
 for (int value = 0; value < 9; value++) {
 Minisat::vec<Minisat::Lit> row;
 row.push(HTriadLiteral(i, 0, value));
 row.push(HTriadLiteral(i, 1, value));
 row.push(HTriadLiteral(i, 2, value));
 CreateOnne(row);
 Minisat::vec<Minisat::Lit> column;
 column.push(VTriadLiteral(0, i, value));
 column.push(VTriadLiteral(1, i, value));
 column.push(VTriadLiteral(2, i, value));
 CreateOnne(column);
 Minisat::vec<Minisat::Lit> hbox;
 hbox.push(HTriadLiteral(3 * (i / 3) + 0, i % 3, value));
 hbox.push(HTriadLiteral(3 * (i / 3) + 1, i % 3, value));
 hbox.push(HTriadLiteral(3 * (i / 3) + 2, i % 3, value));
 CreateOnne(hbox);
 Minisat::vec<Minisat::Lit> vbox;
 vbox.push(VTriadLiteral(i % 3, 3 * (i / 3) + 0, value));
 vbox.push(VTriadLiteral(i % 3, 3 * (i / 3) + 1, value));
 vbox.push(VTriadLiteral(i % 3, 3 * (i / 3) + 2, value));
 CreateOnne(vbox);
 }
 }
 }
 void InitializeCellOnnes() {
 for (int row = 0; row < 9; row++) {
 for (int column = 0; column < 9; column++) {
 Minisat::vec<Minisat::Lit> literals;
 for (int value = 0; value < 9; value++) {
 literals.push(Literal(row, column, value));
 }
 CreateOnne(literals);
 }
 }
 }
 bool SolveSudoku(const char *input, char *solution, size_t *num_guesses) {
 Minisat::vec<Minisat::Lit> assumptions;
 for (int row = 0; row < 9; row++) {
 for (int column = 0; column < 9; column++) {
 char digit = input[row * 9 + column];
 if (digit != '.') {
 assumptions.push(Literal(row, column, digit - '1'));
 }
 }
 }
 solver.decisions = 0;
 bool satisfied = solver.solve(assumptions);
 if (satisfied) {
 for (int row = 0; row < 9; row++) {
 for (int column = 0; column < 9; column++) {
 for (int value = 0; value < 9; value++) {
 if (solver.model[value + 9 * (column + 9 * row)] ==
 Minisat::lbool((uint8_t) 1)) {
 solution[row * 9 + column] = value + '1';
 }
 }
 }
 }
 }
 *num_guesses = solver.decisions - 1;
 return satisfied;
 }
};
} //end anonymous namespace
int main(int argc, const char **argv) {
 char *puzzle = NULL;
 char solution[81];
 size_t size, guesses;
 SolverMiniSat solver;
 while (getline(&puzzle, &size, stdin) != -1) {
 int count = solver.SolveSudoku(puzzle, solution, &guesses);
 printf("%.81s:%d:%.81s\n", puzzle, count, solution);
 }
}

Question 38

Python3 - 2min 1.007s official score

It takes around 100 seconds on my i5-9400F

import copy
SUDOKU_VALUES = [1, 2, 4, 8, 16, 32, 64, 128, 256]
SUDOKU_MAX = 511
OPTION_COUNT_CACHE = [
 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2,
 3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3,
 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3,
 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4,
 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5,
 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4,
 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5,
 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5,
 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3,
 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6,
 6, 7, 6, 7, 7, 8, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3,
 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5,
 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3,
 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6,
 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6,
 7, 5, 6, 6, 7, 6, 7, 7, 8, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4,
 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8, 3, 4,
 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6,
 7, 6, 7, 7, 8, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8, 5, 6, 6, 7,
 6, 7, 7, 8, 6, 7, 7, 8, 7, 8, 8, 9
 ] # Basically just .count_ones()
class SudokuEmpty:
 def __init__(self):
 self.data = list(range(81))
 self.pos = 81
 def remove(self, index):
 self.pos -= 1
 data = self.data
 data[index], data[self.pos] = data[self.pos], data[index]
class Solver:
 def __init__(self, sudoku):
 self.to_explore = SudokuEmpty()
 self.options = [SUDOKU_MAX for _ in range(81)]
 for (i, item) in enumerate(sudoku):
 if item != 0:
 self.options[i] = SUDOKU_VALUES[item - 1]
 self.apply_number(i)
 def hidden_singles(self, square):
 options = self.options
 value = options[square]
 options[square] = 0
 row_start = square // 9 * 9
 column_start = square % 9
 box_start = square // 3 % 3 * 3 + square // 27 * 27
 needed = (SUDOKU_MAX
 - ((options[row_start + 8]
 | options[row_start + 7]
 | options[row_start + 6]
 | options[row_start + 5]
 | options[row_start + 4]
 | options[row_start + 3]
 | options[row_start + 2]
 | options[row_start + 1]
 | options[row_start])
 & (options[column_start + 72]
 | options[column_start + 63]
 | options[column_start + 54]
 | options[column_start + 45]
 | options[column_start + 36]
 | options[column_start + 27]
 | options[column_start + 18]
 | options[column_start + 9]
 | options[column_start])
 & (options[box_start + 20]
 | options[box_start + 19]
 | options[box_start + 18]
 | options[box_start + 11]
 | options[box_start + 10]
 | options[box_start + 9]
 | options[box_start + 2]
 | options[box_start + 1]
 | options[box_start])))
 option_count = OPTION_COUNT_CACHE[needed]
 if option_count == 0:
 self.options[square] = value
 return True
 elif option_count == 1:
 if value & needed != 0:
 self.options[square] = value & needed
 return True
 else:
 return False
 else:
 return False
 def apply_number(self, square):
 options = self.options
 value = options[square]
 not_value = SUDOKU_MAX - value
 column_start = square % 9
 row_start = square - column_start
 box_start = square // 3 % 3 * 3 + square // 27 * 27
 options[row_start + 8] &= not_value
 options[row_start + 7] &= not_value
 options[row_start + 6] &= not_value
 options[row_start + 5] &= not_value
 options[row_start + 4] &= not_value
 options[row_start + 3] &= not_value
 options[row_start + 2] &= not_value
 options[row_start + 1] &= not_value
 options[row_start] &= not_value
 options[column_start + 72] &= not_value
 options[column_start + 63] &= not_value
 options[column_start + 54] &= not_value
 options[column_start + 45] &= not_value
 options[column_start + 36] &= not_value
 options[column_start + 27] &= not_value
 options[column_start + 18] &= not_value
 options[column_start + 9] &= not_value
 options[column_start] &= not_value
 options[box_start + 20] &= not_value
 options[box_start + 19] &= not_value
 options[box_start + 18] &= not_value
 options[box_start + 11] &= not_value
 options[box_start + 10] &= not_value
 options[box_start + 9] &= not_value
 options[box_start + 2] &= not_value
 options[box_start + 1] &= not_value
 options[box_start] &= not_value
 options[square] = value
 def process(self, routes):
 values = []
 while 1:
 min_length = 20
 min_pos = 0
 min_pos_x = 0
 x = 0
 while x < self.to_explore.pos:
 pos = self.to_explore.data[x]
 if not self.hidden_singles(pos):
 return False
 option = self.options[pos]
 length = OPTION_COUNT_CACHE[option]
 if length < min_length:
 if length == 0:
 return False
 elif length == 1:
 for (i, item) in enumerate(SUDOKU_VALUES):
 if option == item:
 self.apply_number(pos)
 self.to_explore.remove(x)
 break
 else:
 min_length = length
 min_pos = pos
 min_pos_x = x
 x += 1
 else:
 x += 1
 if min_length != 20:
 values.clear()
 options = self.options[min_pos]
 for (i, item) in enumerate(SUDOKU_VALUES):
 if options & item != 0:
 values.append(i + 1)
 if not values:
 return False
 self.to_explore.remove(min_pos_x)
 item = values.pop()
 for value in values:
 clone = copy.deepcopy(self)
 clone.options[min_pos] = SUDOKU_VALUES[value - 1]
 clone.apply_number(min_pos)
 routes.append(clone)
 self.options[min_pos] = SUDOKU_VALUES[item - 1]
 self.apply_number(min_pos)
 else:
 return True
 def get_result(self):
 solution = [0 for _ in range(81)]
 for (i, option) in enumerate(self.options):
 for (x, value) in enumerate(SUDOKU_VALUES):
 if option == value:
 solution[i] = x + 1
 break
 return solution
def solve(sudoku):
 routes = [Solver(sudoku)]
 while routes:
 route = routes.pop()
 result = route.process(routes)
 if result:
 return route.get_result()
 raise Exception("Empty routes, but still unsolved")
if __name__ == '__main__':
 with open('all_17_clue_sudokus.txt') as file:
 sudokus = file.read().splitlines()
 print(sudokus[0])
 for sudoku in sudokus[1:]:
 solution = ''.join(map(str, solve(list(map(int, sudoku)))))
 print('%s,%s' % (sudoku, solution))

The path to the sudokus is hardcoded, it has to be all_17_clue_sudokus.txt

To run

time python3 lib.py > output
sha256sum output

Question 39

It was a while since I ran these benchmarks, but if I have some time this weekend I'll try to run yours and give it a score

Question 40

Java - 4.056s official score

The main idea of this is to never allocate memory when it is not needed. The only exception are primitives, which should be optimized by the compiler anyway. Everything else is stored as masks and arrays of operations done in each step, which can be undone when the recursion step is completed.

About half of all sudokus are solved completely without backtracking, but if I push that number higher the overall time seems to be slower. I'm planning om rewriting this in C++ and optimize even further, but this solver is becoming a behemoth.

I wanted to implement as much caching as possible, which lead to some issues. For example, if there are two cells on the same row which can only have the number 6, then we have reached an impossible case, and should return to the backtracking. But since I calculated all options in one sweep, and then placed numbers in cells with only one possibility, I didn't double check that I had placed a number in the same row just before. This lead to impossible solutions.

With everything being contained in the arrays defined at the top, the memory usage of the actual solver is about 216kB. The main part of the memory usage comes from the array containing all the puzzles, and the I/O handlers in Java.

EDIT: I have a version which is translated to C++ now, but it isn't vastly faster. The official time is around 3.5 seconds, which isn't a huge improvement. I think the main issue with my implementation is that I keep my masks as arrays rather than bitmasks. I'll try to analyze Arnauld's solution to see what can be done to improve it.

import java.util.HashMap;
import java.util.ArrayList;
import java.util.Arrays;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.File;
import java.io.PrintWriter;
public class Sudoku { 
 final private int[] unsolvedBoard;
 final private int[] solvedBoard; 
 final private int[][] neighbors;
 final private int[][] cells;
 private static int[] clues;
 final private int[][] mask;
 final private int[] formattedMask;
 final private int[][] placedMask;
 final private boolean[][][] lineMask;
 final private int[] lineCounters;
 final private int[][] sectionCounters;
 final private int[][] sectionMask;
 private int easySolved;
 private boolean isEasy;
 private int totEasy;
 private int placedNumbers;
 public long totTime = 0;
 private boolean solutionFound;
 public long lastPrint;
 private boolean shouldPrint;
 private boolean isImpossible = false;
 public Sudoku() {
 mask = new int[81][9];
 formattedMask = new int[81];
 placedMask = new int[64][64];
 lineMask = new boolean[64][81][9];
 sectionCounters = new int[9][27];
 sectionMask = new int[9][27];
 lineCounters = new int[64];
 neighbors = new int[81][20];
 unsolvedBoard = new int[81];
 solvedBoard = new int[81];
 cells = new int[][] {{0 ,1 ,2 ,9 ,10,11,18,19,20},
 {3 ,4 ,5 ,12,13,14,21,22,23},
 {6 ,7 ,8 ,15,16,17,24,25,26},
 {27,28,29,36,37,38,45,46,47},
 {30,31,32,39,40,41,48,49,50},
 {33,34,35,42,43,44,51,52,53},
 {54,55,56,63,64,65,72,73,74},
 {57,58,59,66,67,68,75,76,77},
 {60,61,62,69,70,71,78,79,80}};
 }
 final public long solveSudoku(int[] board, int clue) {
 long t1 = 0,t2 = 0;
 t1 = System.nanoTime();
 System.arraycopy(board, 0, unsolvedBoard, 0, 81);
 System.arraycopy(board, 0, solvedBoard, 0, 81);
 placedNumbers = 0;
 solutionFound = false;
 isEasy = true;
 isImpossible = false;
 for (int[] i : mask) {
 Arrays.fill(i, 0);
 }
 for (boolean[][] i : lineMask) {
 for (boolean[] j : i) {
 Arrays.fill(j, false);
 }
 }
 for (int i = 0; i < 81; i++) {
 if (solvedBoard[i] != -1) {
 put(i, solvedBoard[i]);
 placedNumbers++;
 }
 }
 solve(0, 0);
 t2 = System.nanoTime();
 easySolved += isEasy ? 1 : 0;
 if (solutionFound && placedNumbers == 81) {
 totTime += t2-t1;
 if (shouldPrint || t2-t1 > 5*1_000_000_000L) {
 System.out.print(String.format(
 "Solution from %2d clues found in %7s", 
 clue, 
 printTime(t1, t2)
 ));
 shouldPrint = false;
 if (t2-t1 > 1*1000_000_000L) {
 System.out.println();
 display2(board, solvedBoard);
 }
 }
 } else {
 System.out.println("No solution");
 display2(unsolvedBoard, solvedBoard);
 return -1;
 }
 return t2 - t1;
 }
 final private void solve(int v, int vIndex) {
 lineCounters[vIndex] = 0;
 int easyIndex = placeEasy(vIndex);
 if (isImpossible) {
 resetEasy(vIndex, easyIndex);
 resetLineMask(vIndex);
 return;
 }
 if (placedNumbers == 81) {
 solutionFound = true;
 return;
 }
 // if (true) {
 // return;
 // }
 // either get the next empty cell
 // while (v < 81 && solvedBoard[v] >= 0) {
 // v++;
 // }
 // or get the cell with the fewest options
 generateFormattedMasks();
 int minOptions = 9;
 for (int i = 0; i < 81; i++) {
 int options = formattedMask[i] & 0xffff;
 if (options > 0 && options < minOptions) {
 minOptions = options;
 v = i;
 }
 if (options == 0 && solvedBoard[i] == -1) {
 isImpossible = true;
 }
 }
 if (!isImpossible) {
 for (int c = 0; c < 9; c++) {
 if (isPossible(v, c)) {
 isEasy = false;
 put(v, c);
 placedNumbers++;
 solve(v + 1, vIndex + 1); 
 if (solutionFound) {
 return;
 }
 unput(v, c);
 placedNumbers--;
 }
 }
 }
 resetEasy(vIndex, easyIndex);
 resetLineMask(vIndex);
 }
 final private void resetEasy(int vIndex, int easyIndex) {
 for (int i = 0; i < easyIndex; i++) {
 int tempv2 = placedMask[vIndex][i];
 int c2 = solvedBoard[tempv2];
 unput(tempv2, c2);
 placedNumbers--;
 }
 }
 final private void resetLineMask(int vIndex) {
 if (lineCounters[vIndex] > 0) {
 for (int i = 0; i < 81; i++) {
 for (int c = 0; c < 9; c++) {
 if (lineMask[vIndex][i][c]) {
 enable(i, c);
 lineMask[vIndex][i][c] = false;
 }
 }
 }
 } 
 isImpossible = false;
 }
 final private int placeEasy(int vIndex) {
 int easyIndex = 0;
 int lastPlaced = 0, tempPlaced = 0, easyplaced = 0;
 int iter = 0;
 while (placedNumbers > lastPlaced+1) {
 lastPlaced = placedNumbers;
 tempPlaced = 0;
 while (placedNumbers > tempPlaced + 5) {
 tempPlaced = placedNumbers;
 easyIndex = placeNakedSingles(vIndex, easyIndex);
 if (isImpossible) {
 return easyIndex;
 }
 }
 tempPlaced = 0;
 while (placedNumbers < 55*1 && placedNumbers > tempPlaced + 2) {
 tempPlaced = placedNumbers;
 easyIndex = placeHiddenSingles(vIndex, easyIndex);
 if (isImpossible) {
 return easyIndex;
 }
 }
 tempPlaced = 0;
 while (placedNumbers < 65*1 && placedNumbers > tempPlaced + 1) {
 tempPlaced = placedNumbers;
 easyIndex = placeNakedSingles(vIndex, easyIndex);
 if (isImpossible) {
 return easyIndex;
 }
 }
 if (iter < 2 && placedNumbers < 55*1) {
 checkNakedTriples(vIndex);
 }
 if (placedNumbers < 45*1) {
 checkNakedDoubles(vIndex);
 identifyLines(vIndex);
 }
 iter++;
 }
 return easyIndex;
 }
 final private int placeNakedSingles(int vIndex, int easyIndex) {
 generateFormattedMasks();
 for (int tempv = 0; tempv < 81; tempv++) {
 int possibilities = formattedMask[tempv];
 if ((possibilities & 0xffff) == 1) {
 possibilities >>= 16;
 int c = 0;
 while ((possibilities & 1) == 0) {
 possibilities >>= 1;
 c++;
 }
 if (isPossible(tempv, c)) {
 put(tempv, c);
 placedMask[vIndex][easyIndex++] = tempv;
 placedNumbers++;
 } else {
 isImpossible = true;
 return easyIndex;
 }
 } else if (possibilities == 0 && solvedBoard[tempv] == -1) {
 isImpossible = true;
 return easyIndex;
 }
 }
 return easyIndex;
 }
 final private int placeHiddenSingles(int vIndex, int easyIndex) {
 for (int[] i : sectionCounters) {
 Arrays.fill(i, 0);
 }
 for (int c = 0; c < 9; c++) {
 for (int v = 0; v < 81; v++) {
 if (isPossible(v, c)) {
 int cell = 3 * (v / 27) + ((v / 3) % 3);
 sectionCounters[c][v / 9]++;
 sectionCounters[c][9 + (v % 9)]++;
 sectionCounters[c][18 + cell]++;
 sectionMask[c][v / 9] = v;
 sectionMask[c][9 + (v % 9)] = v;
 sectionMask[c][18 + cell] = v;
 }
 }
 int v;
 for (int i = 0; i < 9; i++) {
 if (sectionCounters[c][i] == 1) {
 v = sectionMask[c][i];
 if (isPossible(v, c)) {
 put(v, c);
 placedMask[vIndex][easyIndex++] = v;
 placedNumbers++;
 int cell = 3 * (v / 27) + ((v / 3) % 3);
 sectionCounters[c][9 + (v%9)] = 9;
 sectionCounters[c][18 + cell] = 9;
 } else {
 isImpossible = true;
 return easyIndex;
 }
 }
 }
 for (int i = 9; i < 18; i++) {
 if (sectionCounters[c][i] == 1) {
 v = sectionMask[c][i];
 if (isPossible(v, c)) {
 put(v, c);
 placedMask[vIndex][easyIndex++] = v;
 int cell = 3 * (v / 27) + ((v / 3) % 3);
 placedNumbers++;
 sectionCounters[c][18 + cell]++;
 } else {
 isImpossible = true;
 return easyIndex;
 }
 }
 }
 for (int i = 18; i < 27; i++) {
 if (sectionCounters[c][i] == 1) {
 v = sectionMask[c][i];
 if (isPossible(v, c)) {
 put(v, c);
 placedMask[vIndex][easyIndex++] = v;
 placedNumbers++;
 } else {
 isImpossible = true;
 return easyIndex;
 }
 }
 }
 }
 return easyIndex;
 }
 final private int getFormattedMask(int v) {
 if (solvedBoard[v] >= 0) {
 return 0;
 }
 int x = 0;
 int y = 0;
 for (int c = 8; c >= 0; c--) {
 x <<= 1;
 x += mask[v][c] == 0 ? 1 : 0;
 y += mask[v][c] == 0 ? 1 : 0;
 }
 x <<= 16;
 return x + y;
 }
 final private int getCachedMask(int v) {
 return formattedMask[v];
 }
 final private void generateFormattedMasks() {
 for (int i = 0; i < 81; i++) {
 formattedMask[i] = getFormattedMask(i);
 }
 }
 final private void generateFormattedMasks(int[] idxs) {
 for (int i : idxs) {
 formattedMask[i] = getFormattedMask(i);
 }
 }
 final private void checkNakedDoubles(int vIndex) {
 generateFormattedMasks();
 for (int i = 0; i < 81; i++) {
 int bitmask = formattedMask[i];
 if ((bitmask & 0xffff) == 2) {
 for (int j = i+1; j < (i/9+1)*9; j++) {
 int bitmask_j = formattedMask[j];
 if (bitmask == bitmask_j) {
 bitmask >>= 16;
 int c0, c1, k = 0;
 while ((bitmask & 1) == 0) {
 k++;
 bitmask >>= 1;
 }
 c0 = k;
 bitmask >>= 1;
 k++;
 while ((bitmask & 1) == 0) {
 k++;
 bitmask >>= 1;
 }
 c1 = k;
 for (int cell = (i/9)*9; cell < (i/9+1)*9; cell++) {
 if (cell != i && cell != j) {
 if (!lineMask[vIndex][cell][c0]) {
 disable(cell, c0);
 lineMask[vIndex][cell][c0] = true;
 lineCounters[vIndex]++;
 }
 if (!lineMask[vIndex][cell][c1]) {
 disable(cell, c1);
 lineMask[vIndex][cell][c1] = true;
 lineCounters[vIndex]++;
 }
 }
 }
 }
 }
 }
 }
 for (int idx = 0; idx < 81; idx++) {
 int i = (idx%9)*9 + idx/9;
 int bitmask = formattedMask[i];
 if ((bitmask & 0xffff) == 2) {
 for (int j = i+9; j < 81; j += 9) {
 int bitmask_j = formattedMask[j];
 if (bitmask == bitmask_j) {
 bitmask >>= 16;
 int c0, c1, k = 0;
 while ((bitmask & 1) == 0) {
 k++;
 bitmask >>= 1;
 }
 c0 = k;
 bitmask >>= 1;
 k++;
 while ((bitmask & 1) == 0) {
 k++;
 bitmask >>= 1;
 }
 c1 = k;
 for (int cell = i % 9; cell < 81; cell += 9) {
 if (cell != i && cell != j) {
 if (!lineMask[vIndex][cell][c0]) {
 disable(cell, c0);
 lineMask[vIndex][cell][c0] = true;
 lineCounters[vIndex]++;
 }
 if (!lineMask[vIndex][cell][c1]) {
 disable(cell, c1);
 lineMask[vIndex][cell][c1] = true;
 lineCounters[vIndex]++;
 }
 }
 }
 }
 }
 }
 }
 for (int idx = 0; idx < 9; idx++) {
 for (int i = 0; i < 9; i++) {
 int bitmask = formattedMask[cells[idx][i]];
 if ((bitmask & 0xffff) == 2) {
 for (int j = i+1; j < 9; j++) {
 int bitmask_j = formattedMask[cells[idx][j]];
 if (bitmask == bitmask_j) {
 bitmask >>= 16;
 int c0, c1, k = 0;
 while ((bitmask & 1) == 0) {
 k++;
 bitmask >>= 1;
 }
 c0 = k;
 bitmask >>= 1;
 k++;
 while ((bitmask & 1) == 0) {
 k++;
 bitmask >>= 1;
 }
 c1 = k;
 for (int cellIdx = 0; cellIdx < 9; cellIdx++) {
 if (cellIdx != i && cellIdx != j) {
 int cell = cells[idx][cellIdx];
 if (!lineMask[vIndex][cell][c0]) {
 disable(cell, c0);
 lineMask[vIndex][cell][c0] = true;
 lineCounters[vIndex]++;
 }
 if (!lineMask[vIndex][cell][c1]) {
 disable(cell, c1);
 lineMask[vIndex][cell][c1] = true;
 lineCounters[vIndex]++;
 }
 }
 }
 }
 }
 }
 }
 }
 }
 final private void checkNakedTriples(int vIndex) {
 generateFormattedMasks();
 for (int i = 0; i < 81; i++) {
 int bitmask = formattedMask[i];
 if ((bitmask & 0xffff) == 3) {
 for (int j = i+1; j < (i/9+1)*9; j++) {
 int bitmask_j = formattedMask[j];
 if (bitmask_j > 0 && bitmask == (bitmask | bitmask_j)) {
 for (int k = j+1; k < (i/9+1)*9; k++) {
 int bitmask_k = formattedMask[k];
 if (bitmask_k > 0 && bitmask == (bitmask | bitmask_k)) {
 int bitmask_shifted = bitmask >> 16;
 int c0, c1, c2, l = 0;
 while ((bitmask_shifted & 1) == 0) {
 l++;
 bitmask_shifted >>= 1;
 }
 c0 = l;
 bitmask_shifted >>= 1;
 l++;
 while ((bitmask_shifted & 1) == 0) {
 l++;
 bitmask_shifted >>= 1;
 }
 c1 = l;
 bitmask_shifted >>= 1;
 l++;
 while ((bitmask_shifted & 1) == 0) {
 l++;
 bitmask_shifted >>= 1;
 }
 c2 = l;
 for (int cell = (i/9)*9; cell < (i/9+1)*9; cell++) {
 if (cell != i && cell != j && cell != k) {
 if (!lineMask[vIndex][cell][c0]) {
 disable(cell, c0);
 lineMask[vIndex][cell][c0] = true;
 lineCounters[vIndex]++;
 }
 if (!lineMask[vIndex][cell][c1]) {
 disable(cell, c1);
 lineMask[vIndex][cell][c1] = true;
 lineCounters[vIndex]++;
 }
 if (!lineMask[vIndex][cell][c2]) {
 disable(cell, c2);
 lineMask[vIndex][cell][c2] = true;
 lineCounters[vIndex]++;
 }
 }
 }
 }
 }
 }
 }
 }
 }
 for (int idx = 0; idx < 81; idx++) {
 int i = (idx%9)*9 + idx/9;
 int bitmask = formattedMask[i];
 if ((bitmask & 0xffff) == 3) {
 for (int j = i+9; j < 81; j += 9) {
 int bitmask_j = formattedMask[j];
 if (bitmask_j > 0 && bitmask == (bitmask | bitmask_j)) {
 for (int k = j+9; k < 81; k += 9) {
 int bitmask_k = formattedMask[k];
 if (bitmask_k > 0 && bitmask == (bitmask | bitmask_k)) {
 int bitmask_shifted = bitmask >> 16;
 int c0, c1, c2, l = 0;
 while ((bitmask_shifted & 1) == 0) {
 l++;
 bitmask_shifted >>= 1;
 }
 c0 = l;
 bitmask_shifted >>= 1;
 l++;
 while ((bitmask_shifted & 1) == 0) {
 l++;
 bitmask_shifted >>= 1;
 }
 c1 = l;
 bitmask_shifted >>= 1;
 l++;
 while ((bitmask_shifted & 1) == 0) {
 l++;
 bitmask_shifted >>= 1;
 }
 c2 = l;
 for (int cell = i%9; cell < 81; cell += 9) {
 if (cell != i && cell != j && cell != k) {
 if (!lineMask[vIndex][cell][c0]) {
 disable(cell, c0);
 lineMask[vIndex][cell][c0] = true;
 lineCounters[vIndex]++;
 }
 if (!lineMask[vIndex][cell][c1]) {
 disable(cell, c1);
 lineMask[vIndex][cell][c1] = true;
 lineCounters[vIndex]++;
 }
 if (!lineMask[vIndex][cell][c2]) {
 disable(cell, c2);
 lineMask[vIndex][cell][c2] = true;
 lineCounters[vIndex]++;
 }
 }
 }
 }
 }
 }
 }
 }
 }
 for (int idx = 0; idx < 9; idx++) {
 for (int i = 0; i < 9; i++) {
 int bitmask = formattedMask[cells[idx][i]];
 if ((bitmask & 0xffff) == 3) {
 for (int j = i+1; j < 9; j++) {
 int bitmask_j = formattedMask[cells[idx][j]];
 if (bitmask_j > 0 && bitmask == (bitmask | bitmask_j)) {
 for (int k = j+1; k < 9; k++) {
 int bitmask_k = formattedMask[cells[idx][k]];
 if (bitmask_k > 0 && bitmask == (bitmask | bitmask_k)) {
 int bitmask_shifted = bitmask >> 16;
 int c0, c1, c2, l = 0;
 while ((bitmask_shifted & 1) == 0) {
 l++;
 bitmask_shifted >>= 1;
 }
 c0 = l;
 bitmask_shifted >>= 1;
 l++;
 while ((bitmask_shifted & 1) == 0) {
 l++;
 bitmask_shifted >>= 1;
 }
 c1 = l;
 bitmask_shifted >>= 1;
 l++;
 while ((bitmask_shifted & 1) == 0) {
 l++;
 bitmask_shifted >>= 1;
 }
 c2 = l;
 for (int cellIdx = 0; cellIdx < 9; cellIdx++) {
 if (cellIdx != i && cellIdx != j && cellIdx != k) {
 int cell = cells[idx][cellIdx];
 if (!lineMask[vIndex][cell][c0]) {
 disable(cell, c0);
 lineMask[vIndex][cell][c0] = true;
 lineCounters[vIndex]++;
 }
 if (!lineMask[vIndex][cell][c1]) {
 disable(cell, c1);
 lineMask[vIndex][cell][c1] = true;
 lineCounters[vIndex]++;
 }
 if (!lineMask[vIndex][cell][c2]) {
 disable(cell, c2);
 lineMask[vIndex][cell][c2] = true;
 lineCounters[vIndex]++;
 }
 }
 }
 }
 }
 }
 }
 }
 }
 }
 }
 final private void identifyLines(int vIndex) {
 int disabledLines = 0;
 int[][] tempRowMask = new int[3][9];
 int[][] tempColMask = new int[3][9];
 for (int i = 0; i < 9; i++) {
 for (int c = 0; c < 9; c++) {
 for (int j = 0; j < 3; j++) {
 tempRowMask[j][c] = 0;
 tempColMask[j][c] = 0;
 }
 for (int j = 0; j < 9; j++) {
 if (mask[cells[i][j]][c] == 0) {
 tempRowMask[j/3][c]++;
 tempColMask[j%3][c]++;
 }
 }
 int rowCount = 0;
 int colCount = 0;
 int rowIdx = -1, colIdx = -1;
 for (int j = 0; j < 3; j++) {
 if (tempRowMask[j][c] > 0) {
 rowCount++;
 rowIdx = j;
 }
 if (tempColMask[j][c] > 0) {
 colCount++;
 colIdx = j;
 }
 }
 if (rowCount == 1) {
 for (int j = (i/3)*3; j < (i/3 + 1)*3; j++) {
 if (j != i) {
 for (int k = rowIdx*3; k < (rowIdx+1)*3; k++) {
 int cell = cells[j][k];
 if (!lineMask[vIndex][cell][c]) {
 disable(cell, c);
 lineMask[vIndex][cell][c] = true;
 lineCounters[vIndex]++;
 }
 }
 }
 }
 }
 if (colCount == 1) {
 for (int j = i % 3; j < 9; j += 3) {
 if (j != i) {
 for (int k = colIdx; k < 9; k += 3) {
 int cell = cells[j][k];
 if (!lineMask[vIndex][cell][c]) {
 disable(cell, c);
 lineMask[vIndex][cell][c] = true;
 lineCounters[vIndex]++;
 }
 }
 }
 }
 }
 }
 }
 }
 final private boolean isPossible(int v, int c) {
 return mask[v][c] == 0;
 }
 final private int checkMask(int[][] neighbors, int v, int c) {
 int tempValue = 0;
 for (int n : neighbors[v]) {
 if (mask[n][c] > 0) {
 tempValue++;
 }
 }
 return tempValue;
 }
 final private void put(int v, int c) {
 solvedBoard[v] = c;
 for (int i : neighbors[v]) {
 mask[i][c]++;
 }
 for (int i = 0; i < 9; i++) {
 mask[v][i]++;
 }
 }
 final private void disable(int v, int c) {
 mask[v][c]++;
 }
 final private void unput(int v, int c) {
 solvedBoard[v] = -1;
 for (int i : neighbors[v]) {
 mask[i][c]--;
 }
 for (int i = 0; i < 9; i++) {
 mask[v][i]--;
 } 
 }
 final private void enable(int v, int c) {
 // enables++;
 mask[v][c]--;
 }
 public String getString(int[] board) {
 StringBuilder s = new StringBuilder();
 for (int i : board) {
 s.append(i+1);
 }
 return s.toString();
 }
 public long getTime() {
 return totTime;
 }
 public static String printTime(long t1, long t2) {
 String unit = " ns";
 if (t2-t1 > 10000) {
 unit = " us";
 t1 /= 1000; t2 /= 1000;
 }
 if (t2-t1 > 10000) {
 unit = " ms";
 t1 /= 1000; t2 /= 1000;
 }
 if (t2-t1 > 10000) {
 unit = " seconds";
 t1 /= 1000; t2 /= 1000;
 }
 return (t2-t1) + unit;
 }
 public void display(int[] board) {
 for (int i = 0; i < 9; i++) {
 if (i % 3 == 0) {
 System.out.println("+-----+-----+-----+");
 }
 for (int j = 0; j < 9; j++) {
 if (j % 3 == 0) {
 System.out.print("|");
 } else {
 System.out.print(" ");
 }
 if (board[i*9+j] != -1) {
 System.out.print(board[i*9+j]+1);
 } else {
 System.out.print(" ");
 }
 }
 System.out.println("|");
 }
 System.out.println("+-----+-----+-----+");
 }
 public void display2(int[] board, int[] solved) {
 for (int i = 0; i < 9; i++) {
 if (i % 3 == 0) {
 System.out.println("+-----+-----+-----+ +-----+-----+-----+");
 }
 for (int j = 0; j < 9; j++) {
 if (j % 3 == 0) {
 System.out.print("|");
 } else {
 System.out.print(" ");
 }
 if (board[i*9+j] != -1) {
 System.out.print(board[i*9+j]+1);
 } else {
 System.out.print(" ");
 }
 }
 System.out.print("| ");
 for (int j = 0; j < 9; j++) {
 if (j % 3 == 0) {
 System.out.print("|");
 } else {
 System.out.print(" ");
 }
 if (solved[i*9+j] != -1) {
 System.out.print(solved[i*9+j]+1);
 } else {
 System.out.print(" ");
 }
 }
 System.out.println("|");
 }
 System.out.println("+-----+-----+-----+ +-----+-----+-----+");
 }
 private boolean contains(int[] a, int v) {
 for (int i : a) {
 if (i == v) {
 return true;
 }
 }
 return false;
 }
 public void connect() {
 for (int i = 0; i < 81; i++) {
 for (int j = 0; j < 20; j++) {
 neighbors[i][j] = -1;
 }
 }
 int[] n_count = new int[81];
 HashMap<Integer,ArrayList<Integer>> map 
 = new HashMap<Integer,ArrayList<Integer>>();
 for (int[] c: cells) {
 ArrayList<Integer> temp = new ArrayList<Integer>();
 for (int v : c) {
 temp.add(v);
 }
 for (int v : c) {
 map.put(v,temp);
 }
 }
 for (int i = 0; i < 81; i++) {
 for (int j = (i/9)*9; j < (i/9)*9 + 9; j++) {
 if (i != j) {
 neighbors[i][n_count[i]++] = j;
 }
 }
 for (int j = i%9; j < 81; j += 9) {
 if (i != j) {
 neighbors[i][n_count[i]++] = j;
 }
 }
 for (int j : map.get(i)) {
 if (i != j) {
 if (!contains(neighbors[i], j)) {
 neighbors[i][n_count[i]++] = j;
 }
 }
 }
 }
 }
 public static int[][] getInput(String filename) {
 int[][] boards;
 try (BufferedInputStream in = new BufferedInputStream(
 new FileInputStream(filename))) {
 BufferedReader r = new BufferedReader(
 new InputStreamReader(in, StandardCharsets.UTF_8));
 int n = Integer.valueOf(r.readLine());
 boards = new int[n][81];
 clues = new int[n];
 for (int i = 0; i < n; i++) {
 for (int j = 0; j < 81; j++) {
 int x = r.read();
 boards[i][j] = x - 49;
 clues[i] += x > 48 ? 1 : 0;
 }
 r.read();
 }
 r.close();
 } catch (IOException ex) {
 throw new RuntimeException(ex);
 }
 return boards;
 }
 private int getTotEasy() {
 return totEasy;
 }
 public String getSolution() {
 StringBuilder s = new StringBuilder(256);
 for (int i : unsolvedBoard) {
 s.append(i+1);
 }
 s.append(",");
 for (int i : solvedBoard) {
 s.append(i+1);
 }
 return s.toString();
 }
 public static void main (String[] args) {
 long t0 = System.nanoTime();
 Sudoku gc = new Sudoku();
 File f;
 PrintWriter p;
 try {
 f = new File("sudoku_output.txt");
 p = new PrintWriter(f);
 } catch (Exception e) {
 return;
 }
 if (args.length != 1) {
 System.out.println("Usage: java Sudoku <input_file>");
 return;
 }
 int[][] boards = gc.getInput(args[0]);
 long tinp = System.nanoTime();
 gc.connect();
 long t1 = System.nanoTime();
 p.println(boards.length);
 long maxSolveTime = 0;
 int maxSolveIndex = 0;
 long[] solveTimes = new long[boards.length];
 for (int i = 0; i < boards.length; i++) {
 long tempTime = System.nanoTime();
 if (tempTime - gc.lastPrint > 200_000_000 
 || i == boards.length - 1) {
 gc.shouldPrint = true;
 gc.lastPrint = tempTime;
 System.out.print(String.format(
 "\r(%7d/%7d) ", i+1, boards.length));
 }
 long elapsed = gc.solveSudoku(boards[i], gc.clues[i]);
 if (elapsed == -1) {
 System.out.println("Impossible: " + i);
 }
 if (elapsed > maxSolveTime) {
 maxSolveTime = elapsed;
 maxSolveIndex = i;
 }
 solveTimes[i] = elapsed;
 p.println(gc.getSolution());
 // break;
 }
 p.close();
 long t2 = System.nanoTime();
 Arrays.sort(solveTimes);
 System.out.println();
 System.out.println("Median solve time: " 
 + gc.printTime(0, solveTimes[boards.length/2]));
 System.out.println("Longest solve time: " 
 + gc.printTime(0, maxSolveTime) + " for board " + maxSolveIndex);
 gc.display(boards[maxSolveIndex]);
 System.out.println();
 System.out.println("Total time (including prints): " 
 + gc.printTime(t0,t2));
 System.out.println("Sudoku solving time: " 
 + gc.printTime(0,gc.getTime()));
 System.out.println("Average time per board: " 
 + gc.printTime(0,gc.getTime()/boards.length));
 System.out.println("Number of one-choice digits per board: " 
 + String.format("%.2f", gc.getTotEasy()/(double)boards.length)); 
 System.out.println("Easily solvable boards: " + gc.easySolved);
 System.out.println("\nInput time: " + gc.printTime(t0,tinp));
 System.out.println("Connect time: " + gc.printTime(tinp,t1));
 try {
 Thread.sleep(10000);
 } catch (InterruptedException e) {
 }
 }
}

Question 41

I am am not mistaken you should save some time translating those jagged arrays into 2D arrays.

Question 42

Zig - 0.070s unofficial score

https://github.com/mstdokumaci/sudokumaci/tree/main/zig

Build: zig build-exe -O ReleaseFast main.zig

Run: ./main ../test-data/all_17_clue.sudokus > ../test-data/all_17_clue.solved

Question 43

Edited 2025年09月19日

C++: Schoku (unofficial) 0.021s (multi-threaded), 0.113s (single-threaded)

Schoku source code

Update:

According to Schoku's internal time measurements, it is faster than all other entries on this page. Given the test set (17 clue Sudokus) and single-threaded execution.

Obviously, multi-threaded it trounces the competition.

In the tdoku benchmark, it comes out first or close second in all test sets, reaching from trivial to extremely hard. See the updated link below.

So unless there's a new solver out there that is way better but not shown here, this is the fastest Sudoku solver - period.

I realize that some time has passed since this challenge was created and a winner called out.
The results were obtained on a Ryzen 7 4700U (8 cores) clocked at (up to) 4.1GHz. When multi-threading, the Core speed is around 3.3GHz and 4.1GHz single-threaded. Since this CPU has comparable or lower performance then some other CPUs mentioned here, I assume that these timings might equal or beat the other contenders, either multi-threaded or single-threaded.

I based my submission on Mirage's original code posted here. I found the principled economy of lines of code and solving approaches impressive and it provided an excellent basis for further exploration.

I think tdoku largely deserves the winner's crown - | do hope though that my solver can be useful in one way or another.

What to measure

I understand that multi-threading was not forbidden by the rules, so I provide both multi-threaded and single threaded timings. It was said here that all programs could benefit in comparable ways from multi-threading. I found that Mirage's code provides a perfect counter example, as it does not benefit from multi-threading as much as it could.

When I started using the time command for measuring the execution time I realized that for the relatively small set of 49151 puzzles the pure overhead of loading and starting the program makes up a significant and constant portion of the measured time (around 20ms or nearly 40% in my case).
While this penalty is shared between all submissions, it distorts nonetheless the relative performance and different platforms may carry different penalties. The ratio between multi-threaded and single-threaded execution speed for example is around 5.3 in reality, while with the startup overhead it is under 3.

The best way would be to benchmark using the tdoku benchmark suite, see my results here: Schoku benchmark using tdoku. The second best way is to concatenate the 49151 puzzle lines 10 time to 491510 lines, which should reduce the distortion by 90%.

Schoku also provides an option (-y) to time the program internally and -x to obtain full statistics.

Abstract

To jump to the summary line: By direct comparison on my test computer, I found that my version with changes and enhancements runs around 4.5 times as fast as the original submission in multithreaded mode, based on the internal time measurement and about 2.0 times faster in single-threaded mode.

When using the external clock timing the ratio is 3.1 (multi-threaded).
All this on my test computer, a Ryzen 7 Mobile 4700U processor laptop.

The other statistics look good too: The solving rate of 76.98% vs. 63.79% without guessing and 30798 against 57149 guesses.

Schoku Performance

The picture below shows Schoku performance evolving over several versions relative to Mirage's initial submission (0.1).

(https://drive.google.com/file/d/186va5J8T4HcqT_tldw734ZUXoYM-_cNx/view?usp=sharing "Schoku performance")

You can find the full story and complete commented code on Github: Schoku.

Sample output

The source code is unfortunately not available here as StackExchange limits the overall size. Instead some sample output (updated):

 schoku version: 1.0 speed
 command options: -x
 compile options:
 49151 puzzles entered
 49151 2343280/s puzzles solved
 21.0ms 426ns/puzzle solving time
 38596 78.53% puzzles solved without guessing
 21584 0.44/puzzle guesses
 13497 0.27/puzzle back tracks
 170602 3.47/puzzle digits entered and retracted
 22431 0.46/puzzle 'rounds'
 114463 2.33/puzzle triads resolved
 209017 4.25/puzzle triad updates
 134 0.00/puzzle board states without bivalues
 709 bi-value universal graves detected

You can find the complete and commented source code I tested here: Schoku source (speed branch).

To compile the program, save it as schoku.cpp, then compile with g++ (11.4.0 or 13.2.1) as follows:

 g++ -std=gnu++17 -O3 -fopenmp -pthread -Wall -Wextra -DNDEBUG -mavx2 -mbmi -mbmi2 -mlzcnt -c -MMD -MP -MF ./schoku.d schoku.cpp -o schoku.o` 
 g++ -fopenmp -pthread schoku.o -o schoku

To run the program, make sure you have puzzles.txt (49151 17-clue puzzles) in the same folder:

 rm solutions.txt; time schoku -y

Note the important deletion of solutions.txt. Try using a solid state disk, turning off your Antivirus program and an uncluttered directory.

Usage

At the same time, I added several crucial options to the program:

 -c check for back tracking even when no guess was made (e.g. if puzzles might have no solution)
 -d# provide some detailed information on the progress of the puzzle solving.
 add a 2 for even more detail.
 -h help information (this text)
 -l# solve a single line from the puzzle.
 -mT execution modes (triads resolution)
 -r[ROM] puzzle rules:
 R for regular puzzles (unique solution exists)
 not suitable for puzzles that have multiple solutions
 O find just one solution
 this will find the first of multiple solutions and stop looking
 M determine whether multiple solutions exist beyond the first found
 -t# set the number of threads
 -v verify the solution
 -w display warning (mostly unexpected solving details for regular puzzles)
 -x provide some statistics
 -y provide speed statistics only
 -#1 change base for row and column reporting from 0 to 1

Code Changes

I turned the program into C++, using g++ and as many of the g++/gcc features I found useful, in particular: inlining, templates, vector extensions and builtins.
While I was at it, I turned to using intrinsics and other builtins for direct access to the fastest available instructions for my purpose.
Looking at speed improvements in terms of low hanging fruit, I found that the repeated allocation of the GridState structures did impact performance signifcantly and replaced it by fixed stacks allocated by each thread only once, which turned out to improve the multi-threaded timing significantly.
The hidden single search allowed for removal of repeating calculations, which offered a welcome boost. I also made the search cover all hidden singles.
In the program shown here, I removed completely the hidden single search and added a fast triad resolution pass (also known as locked candidates).
The guessing code was much improved. Throughout, I also added lookup tables where they were useful.
Then I integrated the entering loop (for solved cells) with the search for the next naked single. This unfortunately involved some additional spaghetti coding techniques in form of gotos. I also rewrote the puzzle input and output parts using AVX2.

Question 44

C - 12min 28.374s official score

runs for about ~~(削除) 30m (削除ここまで)~~ 15m on my i5-7200U and produces the correct md5 hash

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<sys/time.h>
#define B break
#define O const
#define P printf
#define R return
#define S static
#define $(x,y...) if(x){y;}
#define E(x...) else{x;}
#define W(x,y...) while(x){y;}
#define fi(x,y...) for(I i=0,_n=(x);i<_n;i++){y;}
#define fj(x,y...) for(I j=0,_n=(x);j<_n;j++){y;}
typedef void V;typedef char C;typedef short H;typedef int I;typedef long long L;
S C h[81][20]; //h[i][0],h[i][1],..,h[i][19] are the squares that clash with square i
S H a[81] //a[i]: bitmask of possible choices; initially one of 1<<0, 1<<1 .. 1<<8, or 511 (i.e. nine bits set)
 ,b[81]; //b[i]: negated bitmask of impossible chioces; once we know square i has value v, b[i] becomes ~(1<<v)
S I f(){ //f:recursive solver
 I p=-1; //keep track of the popcount (number of 1 bits) in a
 W(1,I q=0; //simple non-recursive deductions:
 fi(81,fj(20,a[i]&=b[h[i][j]]) // a[i] must not share bits with its clashing squares
 $(!(a[i]&a[i]-1),$(!a[i],R 0)b[i]=~a[i]) // if a[i] has one bit left, update b[i]. if a[i]=0, we have a contradiction
 q+=__builtin_popcount(a[i])) // compute new popcount
 $(p==q,B)p=q;) // if the popcount of a[] changed, try to do more deductions
 I k=-1,mc=10;fi(81,$(b[i]==-1,I c=__builtin_popcount(a[i]);$(c<mc,k=i;mc=c;$(c==2,B)))) //find square with fewest options left
 $(k==-1,R 1) //if there isn't any such, we're done - success! otherwise k is that square
 fi(9,$(a[k]&1<<i,H a0[81],b0[81]; //try different values for square k
 memcpy(a0,a,81*sizeof(*a));memcpy(b0,b,81*sizeof(*b)); // save a and b
 a[k]=1<<i;b[k]=~a[k];$(f(),R 1) // set square k and make a recursive call
 memcpy(a,a0,81*sizeof(*a));memcpy(b,b0,81*sizeof(*b)))) // restore a and b
 R 0;}
S L tm(){struct timeval t;gettimeofday(&t,0);R t.tv_sec*1000000+t.tv_usec;} //current time in microseconds
I main(){L t=0;I n;scanf("%d",&n);P("%d\n",n);
 fi(81,L l=0;fj(81,$(i!=j&&(i%9==j%9||i/9==j/9||(i/27==j/27&&i%9/3==j%9/3)),h[i][l++]=j))) //precompute h
 fi(n,S C s[82];scanf("%s",s);printf("%s,",s); //i/o and loop over puzzles
 fj(81,a[j]=s[j]=='0'?511:1<<(s[j]-'1');b[j]=s[j]=='0'?-1:~a[j]) //represent '1' .. '9' as 1<<0 .. 1<<8, and 0 as 511
 t-=tm();I r=f();t+=tm(); //measure time only for the solving function
 $(!r,P("can't solve\n");exit(1)) //shouldn't happen
 fj(81,s[j]=a[j]&a[j]-1?'0':'1'+__builtin_ctz(a[j])) //1<<0 .. 1<<8 to '1' .. '9'
 P("%s\n",s)) //output
 fflush(stdout);dprintf(2,"time:%lld microseconds\n",t);R 0;} //print self-measured time to stderr so it doesn't affect stdout's md5

compile (preferably with clang v6) and run:

clang -O3 -march=native a.c
time ./a.out <all_17_clue_sudokus.txt | tee o.txt | nl
md5sum o.txt

Question 45

Why so ugly? This isn't code-golf!

Question 46

@JonathanAllan that's how i usually code (unless i'm in a team who prefer to do otherwise). it's beautiful :)

Question 47

Haha, "beautiful", and easy to come back to in 6 months :p

Question 48

yes, actually. i've been doing this for a couple of years and i find it more efficient. in the apl world it's known as incunabulum style. with bloated code you move your eyes mostly vertically (unnatural and unfit for our landscape monitors) and scroll a lot. with tight code you can see all of it at once, so it's easier to find your way around it and to judge its complexity at a glance.

Question 49

Is it a backtracking solution? I see two memcpy in there and some recursion going on. I'll try to verify it today.

Question 50

Typescript(Deno) - 22min 12s unofficial score

on an Lenovo G50-45 Laptop with 4cores and 4threads @1.5GHz

reads from all_17_clue_sudokus.txt or first argument

outputs to output.txt or second argument

download and run with deno run --allow-all https://raw.githubusercontent.com/RubenArtmann/sudoku-solver/master/mod.ts with Deno installed

github page

solver.ts

const parse = (string: string)=>{
 let sudoku = new Uint16Array(9*9);
 for(let i=0; i<string.length; i++) {
 let n = string.charCodeAt(i)-"1".charCodeAt(0);
 sudoku[i] = n>=0&&n<9?1<<n:511;
 }
 return sudoku;
};
const bitCount = (n:number)=>{
 n = n - ((n >> 1) & 0x55555555)
 n = (n & 0x33333333) + ((n >> 2) & 0x33333333)
 return ((n + (n >> 4) & 0xF0F0F0F) * 0x1010101) >> 24
};
const forced = (sudoku:Uint16Array)=>{
 for(let i=0; i<sudoku.length; i++) {
 let row = i%9;
 let column = Math.floor(i/9);
 // skip if already spread
 if(sudoku[i]&(1<<9)) continue;
 if(bitCount(sudoku[i]&511)===1) {
 for(let j=0; j<9; j++) {
 if(j!==column) sudoku[row+j*9] &= ~(sudoku[i]&511);
 if(j!==row) sudoku[j+column*9] &= ~(sudoku[i]&511);
 let xoff = row-row%3;
 let yoff = column-column%3;
 let x = xoff+j%3;
 let y = yoff+Math.floor(j/3);
 if(x!==row && y!==column) sudoku[x+y*9] &= ~(sudoku[i]&511);
 // set already spread bit
 sudoku[i] |= 1<<9;
 }
 }
 }
};
const dump = (sudoku:Uint16Array)=>{
 return [...sudoku].map(e=>{
 let char = "";
 for(let j=0; j<9; j++) {
 if(e&(1<<j)) char += j+1;
 }
 if(char.length!==1) {
 log(sudoku)
 throw new Error("");
 }
 return char;
 }).join("");
};
const log = (sudoku:Uint16Array)=>{
 for(let column=0; column<9; column++) {
 console.log([...sudoku.slice(column*9,column*9+9)].map((n,row)=>{
 let string = "";
 for(let j=0; j<9; j++) {
 string += sudoku[column*9+row]&(1<<j)?j+1:".";
 }
 return string;
 }).join(" "));
 }
 console.log("")
};
let count = 0;
const solve = (sudoku:Uint16Array)=>{
 let sudokuBuf = new Uint16Array(9*9);
 let stack: Uint16Array[] = [sudoku];
 let stack_length = 1;
 while(stack_length>0) {
 count++;
 // console.log(stack.length)
 stack_length--;
 let sudoku: Uint16Array = stack[stack_length] as Uint16Array;
 stack[stack_length] = sudokuBuf;
 sudokuBuf = sudoku;
 // do forced moves
 forced(sudoku);
 let solved = true;
 let invalid = false;
 for(let i=0; i<sudoku.length; i++) {
 let c = bitCount(sudoku[i]&511);
 if(c!==1) solved = false;
 if(c===0) {
 invalid = true;
 break;
 }
 }
 if(invalid) continue;
 if(solved) {
 // need to apply forced one more time and check for changes / empty cells
 forced(sudoku);
 let solved = true;
 for(let i=0; i<sudoku.length; i++) {
 let c = bitCount(sudoku[i]&511);
 if(c===0) solved = false;
 }
 if(solved) return sudoku;
 continue;
 }
 // choose cell to split on
 let splitIndex = -1;
 for(let i=2; i<=9; i++) {
 for(let j=0; j<sudoku.length; j++) {
 if(bitCount(sudoku[j]&511)===i) {
 splitIndex = j;
 i = 1000;
 break;
 }
 }
 }
 if(splitIndex<0) throw new Error("");
 // branch/split
 for(let i=0; i<9; i++) {
 if(sudoku[splitIndex]&(1<<i)) {
 if(stack[stack_length]===undefined) stack[stack_length] = new Uint16Array(9*9);
 let newsudoku = stack[stack_length];
 newsudoku.set(sudoku,0);
 newsudoku[splitIndex] = 1<<i;
 stack_length++;
 }
 }
 // log(sudoku);
 }
 throw new Error("Out of Options!")
 return new Uint16Array(9*9);
};
const context: Worker = self as any;
context.onmessage = (event)=>{
 let result = dump(solve(parse(event.data.data)));
 // console.log(count)
 context.postMessage({
 data: result,
 callback: event.data.callback
 })
};

mod.ts

let callbacks = new Map();
let workers: any[] = [];
let tasks: {data:any,callback:number}[] = [];
for(let i=0; i<4; i++) {
 let worker = new Worker(new URL("solver.ts", import.meta.url).toString(),{ type: "module" }) as any;
 worker.tasks = 0;
 worker.onmessage = (event:{data:any})=>{
 worker.tasks--;
 callbacks.get(event.data.callback)(event.data.data);
 callbacks.delete(event.data.callback);
 };
 workers.push(worker);
}
const sleep = (time:number)=>{
 return new Promise((resolve)=>{
 setTimeout(resolve,time);
 });
};
const loadBalancer = async()=>{
 while(true) {
 // console.log(workers.map(w=>w.tasks),tasks.length);
 let dispatches = 0;
 for(let i=0; i<workers.length; i++) {
 let worker = workers[i];
 if(worker.tasks>=5||tasks.length<=0) continue;
 if(worker.taks<=0) console.log("running low!")
 let task = tasks.shift();
 if(task === undefined) continue;
 worker.postMessage(task);
 worker.tasks++;
 dispatches++;
 }
 if(dispatches<=0) await sleep(100);
 }
};
loadBalancer();
const dispatch = async(data:any)=>{
 let random = Math.random();
 tasks.push({
 data: data,
 callback: random
 });
 return new Promise((resolve)=>{
 callbacks.set(random,resolve);
 });
};
// dispatch("030000000000001008700580000000024050040873900003600000900000002005002091200000704").then(console.log)
// console.log(workers)
let count = 0;
let file = Deno.readTextFileSync(Deno.args[0]||"./all_17_clue_sudokus.txt");
// let file = Deno.readTextFileSync("./hard_sudokus.txt");
let sudokus = file.split("\n").slice(1);
let promises = sudokus.map(async(e,i)=>{
 let result = await dispatch(e);
 count++;
 console.log(count,"/",sudokus.length);
 console.log(e);
 console.log(result);
 return result;
});
let solutions = await Promise.all(promises);
console.log("done!");
workers.map(w=>w.terminate());
let output = "";
output += `${sudokus.length}\n`;
for(let i=0; i<sudokus.length; i++) {
 output += `${sudokus[i]},${solutions[i]}\n`;
}
Deno.writeTextFileSync(Deno.args[1]||"./output.txt",output);
Deno.exit(0)

Question 51

Welcome to Code Golf! Nice first answer!

Question 52

Welcome to Code Golf! I'm not super familiar with typescript, but I'll try to boot up my old laptop and see if I can get it running!

Question 53

@maxb it should be as simple as curl -fsSL https://deno.land/x/install/install.sh | sh and deno run --allow-all https://raw.githubusercontent.com/RubenArtmann/sudoku-solver/master/mod.ts

Question 54

Rust - 0.150s unofficial score

https://github.com/mstdokumaci/sudokumaci/tree/main/rust

Here is my attempt in Rust. For the list of sudoku puzzles above (all_17_clue), I get ~150ms on a MacBook Pro (16-inch, 2019, 2.3 GHz i9-9880H).

I build the binary by this command: cargo rustc --release -- -C target-cpu=native

I am not an expert on writing optimized code and I am coding in Rust for the first time. I believe the algorithm is at the sweet spot of human solutions and brute force, and it can be further optimized. I would appreciate it if anybody would like to suggest any improvements.

Question 55

Welcome to Code Golf and Coding Challenges Stack Exchange!, But you are only allowed to post a language per answer, You can split by answers.

Question 56

You can include the score, This will help to compare answers.

53x15 3691 silver badge5 bronze badges · Accepted Answer · 2019-08-29 19:12:43Z

C++ - 0.201s official score

Using Tdoku (code; design; benchmarks) gives these results:

~/tdoku$ lscpu | grep Model.name
Model name: Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
~/tdoku$ # build:
~/tdoku$ CC=clang-8 CXX=clang++-8 ./BUILD.sh
~/tdoku$ clang -o solve example/solve.c build/libtdoku.a 
~/tdoku$ # adjust input format:
~/tdoku$ sed -e "s/0/./g" all_17_clue_sudokus.txt > all_17_clue_sudokus.txt.in
~/tdoku$ # solve:
~/tdoku$ time ./solve 1 < all_17_clue_sudokus.txt.in > out.txt
real 0m0.241s
user 0m0.229s
sys 0m0.012s
~/tdoku$ # adjust output format and sha256sum:
~/tdoku$ grep -v "^:0:$" out.txt | sed -e "s/:1:/,/" | tr . 0 | sha256sum
0bc8dda364db7b99f389b42383e37b411d9fa022204d124cb3c8959eba252f05 -

Tdoku has been optimized for hard Sudoku instances. But note, contrary to the problem statement, that 17 clue puzzles are far from the hardest Sudoku. Actually they're among the easiest, with the majority requiring no backtracking at all. See some of the other benchmark datasets in the Tdoku project for puzzles that are actually hard.

Also note that while Tdoku is the fastest solver I'm aware of for hard puzzles, it's not the fastest for 17 clue puzzles. For these I think the fastest is this rust project, a derivative of JCZSolve, which was optimized for 17 clue puzzles during development. Depending on the platform it might be 5-25% faster than Tdoku for these puzzles.

Wow, that was an interesting read about the implementation and theory behind it. Before I started this challenge, I wanted to find state of the art solvers and datasets. I guess I didn't look hard enough. From popular "scientific" articles, the 17 clue puzzles were all that's ever talked about, so it was my assumption that those were the hardest. I'll try to run all submissions against the data sets presented in your article, and I'll benchmark your submission later today. Fantastic job!
Thanks! You see from the article that finding state-of-the-art solving took me on a long journey. :-) I get why people focus on 17 clue puzzles: the dataset is well known, well defined, complete or nearly so, moderately large, and hard for naive solvers. While it's interesting to study harder puzzles, hardness is tricky to formalize. e.g., do we mean subjectively or empirically hard for humans based on the techniques required? do we mean slow on average for a given solver under random permutations? Do we mean minimum backdoor size under a formula with chosen pigeonhole inferences? etc.
0.201s - is that the time taken to solve all ~49k puzzles, roughly 4usec per puzzle?

The fastest Sudoku solver

Winner found

The challenge

Rules

Language

Benchmark machine

Input

Timing / scoring

Verification

What's not allowed

Other info

14 Answers 14

C++ - 0.201s official score

C - 0.045s unofficial score

Node.js, (削除) 8.231s (削除ここまで) 6.735s official score

Code

Example output

Python 3 (with dlx) 4min 46.870s official score

Usage

Python 3 + Z3 - 10min 45.657s official score

C - (削除) 2.228s (削除ここまで) 1.690s official score

C++ with Minisat(2.2.1-5) - 11.735s official score

Python3 - 2min 1.007s official score

Java - 4.056s official score

Zig - 0.070s unofficial score

C++: Schoku (unofficial) 0.021s (multi-threaded), 0.113s (single-threaded)

What to measure

Abstract

Schoku Performance

Sample output

Usage

Code Changes

C - 12min 28.374s official score

Typescript(Deno) - 22min 12s unofficial score

Rust - 0.150s unofficial score

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions