Loading...
3/26/2023
Tech Blog

Algorithm for Solving Sudoku Puzzles and Implementation

 Tags: Python, algorithm, pulp, Tesorflow, Sudoku

 By Chinbat Chindegsuren

 Picture created by : DALL-E image-generating deep-learning model

How to solve Sudoku puzzles

 

           There are various ways to solve Sudoku puzzles. You can solve them using your eyes, brain, and hands, use online tools, or ask someone who knows the solution. However, this time, I would like to introduce a simple way to solve them using programming.

 

1. Backtracking

 

            In essence, it is a typical depth-first search. That is, we repeat and check all possible combinations. Of course, we can speed up the process by using techniques such as pruning. There are about 9^81 (approximately 2×10^77) possible combinations to place numbers 1 to 9 on a 9x9 field, and out of these, about 6.67×10^21 (exactly 6,670,903,752,021,072,936,960) combinations could fill the Sudoku correctly. It is still an enormous number, so this method of checking combinations can be slower compared to other methods. However, the advantage is that if a solution exists, it will always be found.

Visionary :

 

2. Stochastic search

 

         Another way to solve Sudoku puzzles is probabilistic solving, which converges quickly but does not guarantee finding a solution. For example, you can:

  • Fill in the empty cells with random numbers.
  • Calculate the number of errors (e.g., the number of conflicts with the Sudoku rules).
  • Shuffle the numbers to reduce the number of errors, and repeat this process.

Genetic algorithms are sometimes used as the shuffling algorithm.

          Recently, solving Sudoku puzzles using deep learning, which can be classified as "stochastic search," has become popular. Ultimately, it is looking for parameters that minimize the error using gradient descent.

 

3. Constraint programming

 

           Another way to solve Sudoku puzzles is to define them as constraint satisfaction problems, which can be solved using a generic solver for optimization. With this method, if a solution exists, it will always be found.

 

4. Exact cover

 

           This method can be classified as constraint programming, but the algorithm is quite interesting, so it is separated as a separate topic. In simple terms, it is an NP-complete problem to find a set of subsets that can cover a given set without overlapping or missing elements. Algorithms such as Algorithm X, proposed by Donald Knuth, can be used to find all solutions to the exact cover problem. Therefore, if the Sudoku problem can be defined as an exact cover problem, this method can be used to solve it.

 

Implementation

 

             I'll show implementation examples in Python for each solving method. Regarding the implementation, of course, I did not write everything from scratch, and I also referred to code written by others.

Regardless of the implementation method, the following sample problem will be used as a common sample problem:

     By the way, according to my Sudoku-loving wife, this problem doesn't seem to be that difficult. She solved it in 11 minutes.

 

1. Backtracking of short coding

 

def r(a):
    n = 9
    b = 3
    i = a.find("0")
    ~i or exit(a)
    [
        m
        in [
            (i - j)
            % n
            * (i / n ^ j / n)
            * (i / (n * b) ^ j / (n * b) | i % n / b ^ j % n / b)
            or a[j]
            for j in range(n**2)
        ]
        or r(a[:i] + m + a[i + 1 :])
        for m in map(str, range(1, n + 1))
    ]


from sys import *

r(argv[1])

 It's done. Let's run it.

$> time python short_solver.py 000070605000004080200006070073000200500000004004000960080400003050200000706080000
438172695617954382295836471173649258569728134824513967982465713351297846746381529
python short_solver.py   9.66s user 0.03s system 98% cpu 9.807 total

           The problem is represented by writing the numbers as digits from the upper left to the lower right, and filling in the empty spaces with 0. By the way, the code is intentionally made short, but if you expand it, you can see that the worst-case time complexity is O(n^4) for this dull Backtracking algorithm.

 

2. Stochastic search: Challeging to solve Sudoku problems with machine learning

           As a web engineer, I'm not particularly knowledgeable about machine learning, so I asked for help from my colleague who is a machine learning engineer. It's great to have people around who you can ask for help. Here's the process we followed for implementation:

            Data preparation: There were 1 million Sudoku problems and solutions on Kaggle, so we extracted about 10,000 of them. Since we weren't particularly focused on accuracy for this implementation, we worked with a smaller dataset. Model creation: We created a very simple CNN model with only one layer, which is called shallow learning. Execution: We fed the model with the sample problem and obtained the result. One characteristic of Stochastic search is that there is no guarantee that the result will always be a solution. Will the solution be found?

import csv
import numpy as np

from keras.models import Model, load_model
from keras.layers import Conv2D, Input, Concatenate, Flatten, Reshape, Dense, Activation
from keras.utils import plot_model
from keras.preprocessing.image import Iterator


def build_model():
    inp = Input((9, 9, 10))
    x1 = Conv2D(filters=10, kernel_size=(3, 3), strides=(3, 3), padding='valid', activation='relu')(inp)
    x2 = Conv2D(filters=10, kernel_size=(9, 1), padding='valid', activation='relu')(inp)
    x3 = Conv2D(filters=10, kernel_size=(1, 9), padding='valid', activation='relu')(inp)

    x1 = Flatten()(x1)
    x2 = Flatten()(x2)
    x3 = Flatten()(x3)

    x = Concatenate()([x1, x2, x3])
    x = Dense(9 * 9 * 9)(x)
    x = Reshape((9, 9, 9))(x)
    x = Activation('softmax')(x)

    model = Model(inp, x)
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    model.summary()

    return model


def read_csv(path):
    with open(path) as f:
        reader = csv.reader(f)
        next(reader)  # first row is header

        quizes = []
        solutions = []
        for quiz, solution in reader:
            quizes.append(quiz)
            solutions.append(solution)
    return quizes, solutions


def construct_board(s, is_solution=False):
    if is_solution:
        board = np.zeros((9, 9, 9), dtype=np.float32)
    else:
        board = np.zeros((9, 9, 10), dtype=np.float32)  # the 10th dim is for empty grid
    for i, ss in enumerate(s):
        board[i // 9, i % 9, int(ss) - 1] = 1.0   # map 1 to 0th, 2 to 1th, ..., 9 to 8th, and 0 to -1th
    return board


class SudokuIterator(Iterator):
    def __init__(self, csv_path, batch_size, shuffle=False, seed=0):
        quizes, solutions = read_csv(csv_path)
        self.x = np.array([construct_board(q) for q in quizes])
        self.y = np.array([construct_board(s, is_solution=True) for s in solutions])

        n = len(self.x)
        self.steps_per_epoch = n // batch_size
        super(SudokuIterator, self).__init__(n, batch_size, shuffle, seed)

    def _get_batches_of_samples(self, index_array):
        batch_x = self.x[index_array]
        batch_y = self.y[index_array]
        return batch_x, batch_y

    def _get_batches_of_transformed_samples(self, index_array):
        return self._get_batches_of_samples(index_array)  # Keras Iterator requirement

    def next(self):
        with self.lock:
            index_array = next(self.index_generator)
        return self._get_batches_of_samples(index_array)


print('building model...')
sudoku_model = build_model()
print('done.')

print('building generator...')
train_gen = SudokuIterator(csv_path='sudoku10000.csv', batch_size=64, shuffle=True)
print('done.')

sudoku_model.fit_generator(train_gen, steps_per_epoch=train_gen.steps_per_epoch, epochs=50)
sudoku_model.save('sudoku_model.h5')

 I was running this on a Jupyter notebook, and here are the training results:

building model...
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 9, 9, 10)     0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 3, 3, 10)     910         input_1[0][0]                    
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 1, 9, 10)     910         input_1[0][0]                    
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 9, 1, 10)     910         input_1[0][0]                    
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None, 90)           0           conv2d_1[0][0]                   
__________________________________________________________________________________________________
flatten_2 (Flatten)             (None, 90)           0           conv2d_2[0][0]                   
__________________________________________________________________________________________________
flatten_3 (Flatten)             (None, 90)           0           conv2d_3[0][0]                   
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 270)          0           flatten_1[0][0]                  
                                                                 flatten_2[0][0]                  
                                                                 flatten_3[0][0]                  
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 729)          197559      concatenate_1[0][0]              
__________________________________________________________________________________________________
reshape_1 (Reshape)             (None, 9, 9, 9)      0           dense_1[0][0]                    
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 9, 9, 9)      0           reshape_1[0][0]                  
==================================================================================================
Total params: 200,289
Trainable params: 200,289
Non-trainable params: 0
__________________________________________________________________________________________________
done.
building generator...
done.
Epoch 1/50
156/156 [==============================] - 6s 39ms/step - loss: 2.1754 - acc: 0.1548
Epoch 2/50
156/156 [==============================] - 5s 33ms/step - loss: 1.9703 - acc: 0.3388
Epoch 3/50
156/156 [==============================] - 5s 34ms/step - loss: 1.6940 - acc: 0.4593
Epoch 4/50
156/156 [==============================] - 6s 37ms/step - loss: 1.5140 - acc: 0.5098
...
Epoch 50/50
156/156 [==============================] - 6s 38ms/step - loss: 0.9684 - acc: 0.6565

 We executed this on a Jupyter notebook, and the result of the training is as follows:

acc: 0.6565

           This means that the model has an accuracy of about 65% on the training data. Of course, since we kept it very simple this time, there is a high possibility of overfitting. We wrote a script that uses this model to solve Sudoku puzzles.

import sys
import numpy as np
from keras.models import load_model


def construct_board(s, is_solution=False):
    if is_solution:
        board = np.zeros((9, 9, 9), dtype=np.float32)
    else:
        board = np.zeros((9, 9, 10), dtype=np.float32)  # the 10th dim is for empty grid
    for i, ss in enumerate(s):
        board[i // 9, i % 9, int(ss) - 1] = 1.0   # map 1 to 0th, 2 to 1th, ..., 9 to 8th, and 0 to -1th
    return board

sudoku_model = load_model('sudoku_model.h5')
answer = sudoku_model.predict(np.array([construct_board(sys.argv[1])]))

for r in range(9):
    row = ''
    for c in range(9):
        row += str(answer[0][r,c].argmax() + 1)
    print(row)

 I will actually run the code now:

$>time python ml_solver.py 000070605000004080200006070073000200500000004004000960080400003050200000706080000
Using TensorFlow backend.
139878645
667344182
468356779
673868255
565827814
514827968
181485753
352266796
736283542
python ml_solver.py   3.24s user 0.69s system 64% cpu 6.044 total

 

            As you can see from the results, the Sudoku constraints are clearly not satisfied... In other words, the problem could not be solved. Well, this time the training data is small and we are doing it simply, so I think it's okay if you get a rough idea. By the way, the execution time is 3.24 seconds, but most of it is spent loading the model. If you just ask the loaded model for the solution and display it, it only takes 0.004 seconds.

3. Constraint programming using Pulp

We will use a package called Pulp to define the problem as an optimization problem and solve it with the default CBC solver.

# -*- coding: utf-8 -*-
import argparse
from pulp import LpVariable, LpInteger, LpProblem, LpMinimize, LpStatus, lpSum, value


def main(inp):
    n = 9
    b = 3
    digits = [str(d + 1) for d in range(n)]
    values = rows = columns = digits
    answers = []

    choices = LpVariable.dicts("Choice", (values, rows, columns), 0, 1, LpInteger)
    boxes = [[(rows[b * i + k], columns[b * j + l]) for k in range(b) for l in range(b)] for j in range(b) for i in range(b)]

    # 問題提議
    problem = LpProblem("Solving Sudoku", LpMinimize)  # MinimizeでもMaximizeでもOK
    problem += 0, "Arbitrary Objective Function"

    # 制約追加
    for r in rows:
        for c in columns:
            problem += lpSum([choices[v][r][c] for v in values]) == 1, ""

    for v in values:
        for r in rows:
            problem += lpSum([choices[v][r][c] for c in columns]) == 1, ""

        for c in columns:
            problem += lpSum([choices[v][r][c] for r in rows]) == 1, ""

        for b in boxes:
            problem += lpSum([choices[v][r][c] for (r, c) in b]) == 1, ""

    for i in range(n**2):
        val = inp[i]
        if val != '0':
            problem += choices[str(val)][str(i/n + 1)][str(i % n + 1)] == 1, ""

    while True:
        # cbcソルバー利用
        problem.solve()
        if LpStatus[problem.status] == "Optimal":
            answers.append(''.join([v for r in rows for c in columns for v in values if value(choices[v][r][c]) == 1]))
            # 見つけた解を制約として追加
            problem += lpSum(
                [choices[v][r][c] for v in values for r in rows for c in columns if value(choices[v][r][c]) == 1]
            ) <= 80
        else:
            break

    if answers:
        # 最初の解だけ表示
        print(answers[0])


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('inp', type=str)
    args = parser.parse_args()
    main(args.inp)

Defining the problem can be a bit tedious, right? Let's try running it.

$>time python pulp_solver.py 000070605000004080200006070073000200500000004004000960080400003050200000706080000
438172695617954382295836471173649258569728134824513967982465713351297846746381529
python solver.py   0.23s user 0.08s system 57% cpu 0.538 total

That's quite fast.
 

4. Implementation of Dancing Links to solve Exact Cover Problem.

            Dancing links is an implementation of Algorithm X proposed by Knuth, who came up with the idea of solving the Exact cover problem. In this implementation, the board is represented using a doubly linked circular list, and calculations are skipped for places where there are no numbers, making it fast. In the Python implementation below, a dictionary is used instead of a doubly linked circular list, but the access pattern is the same, and the code is just simpler and shorter.

import sys
from itertools import product


def solve_sudoku(grid):
    b = 3
    n = b * b
    x = ([("rc", rc) for rc in product(range(n), range(n))] +
         [("rn", rn) for rn in product(range(n), range(1, n + 1))] +
         [("cn", cn) for cn in product(range(n), range(1, n + 1))] +
         [("bn", bn) for bn in product(range(n), range(1, n + 1))])
    y = dict()
    for r, c, n in product(range(n), range(n), range(1, n + 1)):
        box_number = (r // b) * b + (c // b)
        y[(r, c, n)] = [
            ("rc", (r, c)),
            ("rn", (r, n)),
            ("cn", (c, n)),
            ("bn", (box_number, n))]
    x, y = exact_cover(x, y)
    for i, row in enumerate(grid):
        for j, n in enumerate(row):
            if n:
                select(x, y, (i, j, n))

    for solution in solve(x, y, []):
        for (r, c, n) in solution:
            grid[r][c] = n
        yield grid


def exact_cover(x, y):
    x = {j: set() for j in x}
    for i, row in y.items():
        for j in row:
            x[j].add(i)
    return x, y


def solve(x, y, solution):
    if not x:
        yield list(solution)
    else:
        c = min(x, key=lambda c: len(x[c]))
        for r in list(x[c]):
            solution.append(r)
            cols = select(x, y, r)
            for s in solve(x, y, solution):
                yield s
            deselect(x, y, r, cols)
            solution.pop()


def select(x, y, r):
    cols = []
    for j in y[r]:
        for i in x[j]:
            for k in y[i]:
                if k != j:
                    x[k].remove(i)
        cols.append(x.pop(j))
    return cols


def deselect(x, y, r, cols):
    for j in reversed(y[r]):
        x[j] = cols.pop()
        for i in x[j]:
            for k in y[i]:
                if k != j:
                    x[k].add(i)


if __name__ == "__main__":
    n = 9
    q = sys.argv[1]
    p = [map(int, list(q[i*n:(i+1)*n])) for i in range(n)]
    for solution in solve_sudoku(p):
        print(''.join([''.join(map(str, r)) for r in solution]))

It's one of the strengths of Python that you can write an implementation of such a complex algorithm in about 80 lines of code like this. Let's run it.

$>time python dancing_links_sudoku.py 000070605000004080200006070073000200500000004004000960080400003050200000706080000
438172695617954382295836471173649258569728134824513967982465713351297846746381529
python dancing_links_sudoku.py   0.04s user 0.02s system 66% cpu 0.084 total

It's faster than using Pulp.

Summary

 

          This article introduced various methods to solve Sudoku puzzles, and provided Python implementation examples for each. While non-stochastic methods can always find a solution, stochastic methods cannot guarantee a solution but offer the advantage of shorter execution times if a solution is found. Therefore, combining multiple algorithms such as using a learned model and solving as an Exact cover problem can reduce the total time required to solve many problems. This approach can be applied not only to Sudoku but also to various other problems.

Sudoku is a game originally intended for mental exercise, and some people may question the purpose of solving it with machines. However, I assure you that I also exercised my brain to allow machines to solve it.

 

References