Loading...
3/26/2023
Tech Blog

Algorithm for Solving Sudoku Puzzles and Implementation

 By Chinbat Chindegsuren

 Picture created by : DALL-E image-generating deep-learning model

How to solve Sudoku puzzles

 

           There are various ways to solve Sudoku puzzles. You can solve them using your eyes, brain, and hands, use online tools, or ask someone who knows the solution. However, this time, I would like to introduce a simple way to solve them using programming.

 

1. Backtracking

 

            In essence, it is a typical depth-first search. That is, we repeat and check all possible combinations. Of course, we can speed up the process by using techniques such as pruning. There are about 9^81 (approximately 2×10^77) possible combinations to place numbers 1 to 9 on a 9x9 field, and out of these, about 6.67×10^21 (exactly 6,670,903,752,021,072,936,960) combinations could fill the Sudoku correctly. It is still an enormous number, so this method of checking combinations can be slower compared to other methods. However, the advantage is that if a solution exists, it will always be found.

Visionary :

 

2. Stochastic search

 

         Another way to solve Sudoku puzzles is probabilistic solving, which converges quickly but does not guarantee finding a solution. For example, you can:

  • Fill in the empty cells with random numbers.
  • Calculate the number of errors (e.g., the number of conflicts with the Sudoku rules).
  • Shuffle the numbers to reduce the number of errors, and repeat this process.

Genetic algorithms are sometimes used as the shuffling algorithm.

          Recently, solving Sudoku puzzles using deep learning, which can be classified as "stochastic search," has become popular. Ultimately, it is looking for parameters that minimize the error using gradient descent.

 

3. Constraint programming

 

           Another way to solve Sudoku puzzles is to define them as constraint satisfaction problems, which can be solved using a generic solver for optimization. With this method, if a solution exists, it will always be found.

 

4. Exact cover

 

           This method can be classified as constraint programming, but the algorithm is quite interesting, so it is separated as a separate topic. In simple terms, it is an NP-complete problem to find a set of subsets that can cover a given set without overlapping or missing elements. Algorithms such as Algorithm X, proposed by Donald Knuth, can be used to find all solutions to the exact cover problem. Therefore, if the Sudoku problem can be defined as an exact cover problem, this method can be used to solve it.

 

Implementation

 

             I'll show implementation examples in Python for each solving method. Regarding the implementation, of course, I did not write everything from scratch, and I also referred to code written by others.

Regardless of the implementation method, the following sample problem will be used as a common sample problem:

     By the way, according to my Sudoku-loving wife, this problem doesn't seem to be that difficult. She solved it in 11 minutes.

 

1. Backtracking of short coding

 

def r(a):    n = 9    b = 3    i = a.find("0")    ~i or exit(a)    [        m        in [            (i - j)            % n            * (i / n ^ j / n)            * (i / (n * b) ^ j / (n * b) | i % n / b ^ j % n / b)            or a[j]            for j in range(n**2)        ]        or r(a[:i] + m + a[i + 1 :])        for m in map(str, range(1, n + 1))    ]from sys import *r(argv[1])

 It's done. Let's run it.

$> time python short_solver.py 000070605000004080200006070073000200500000004004000960080400003050200000706080000438172695617954382295836471173649258569728134824513967982465713351297846746381529python short_solver.py   9.66s user 0.03s system 98% cpu 9.807 total

           The problem is represented by writing the numbers as digits from the upper left to the lower right, and filling in the empty spaces with 0. By the way, the code is intentionally made short, but if you expand it, you can see that the worst-case time complexity is O(n^4) for this dull Backtracking algorithm.

 

2. Stochastic search: Challeging to solve Sudoku problems with machine learning

           As a web engineer, I'm not particularly knowledgeable about machine learning, so I asked for help from my colleague who is a machine learning engineer. It's great to have people around who you can ask for help. Here's the process we followed for implementation:

            Data preparation: There were 1 million Sudoku problems and solutions on Kaggle, so we extracted about 10,000 of them. Since we weren't particularly focused on accuracy for this implementation, we worked with a smaller dataset. Model creation: We created a very simple CNN model with only one layer, which is called shallow learning. Execution: We fed the model with the sample problem and obtained the result. One characteristic of Stochastic search is that there is no guarantee that the result will always be a solution. Will the solution be found?

import csvimport numpy as npfrom keras.models import Model, load_modelfrom keras.layers import Conv2D, Input, Concatenate, Flatten, Reshape, Dense, Activationfrom keras.utils import plot_modelfrom keras.preprocessing.image import Iteratordef build_model():    inp = Input((9, 9, 10))    x1 = Conv2D(filters=10, kernel_size=(3, 3), strides=(3, 3), padding='valid', activation='relu')(inp)    x2 = Conv2D(filters=10, kernel_size=(9, 1), padding='valid', activation='relu')(inp)    x3 = Conv2D(filters=10, kernel_size=(1, 9), padding='valid', activation='relu')(inp)    x1 = Flatten()(x1)    x2 = Flatten()(x2)    x3 = Flatten()(x3)    x = Concatenate()([x1, x2, x3])    x = Dense(9 * 9 * 9)(x)    x = Reshape((9, 9, 9))(x)    x = Activation('softmax')(x)    model = Model(inp, x)    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])    model.summary()    return modeldef read_csv(path):    with open(path) as f:        reader = csv.reader(f)        next(reader)  # first row is header        quizes = []        solutions = []        for quiz, solution in reader:            quizes.append(quiz)            solutions.append(solution)    return quizes, solutionsdef construct_board(s, is_solution=False):    if is_solution:        board = np.zeros((9, 9, 9), dtype=np.float32)    else:        board = np.zeros((9, 9, 10), dtype=np.float32)  # the 10th dim is for empty grid    for i, ss in enumerate(s):        board[i // 9, i % 9, int(ss) - 1] = 1.0   # map 1 to 0th, 2 to 1th, ..., 9 to 8th, and 0 to -1th    return boardclass SudokuIterator(Iterator):    def __init__(self, csv_path, batch_size, shuffle=False, seed=0):        quizes, solutions = read_csv(csv_path)        self.x = np.array([construct_board(q) for q in quizes])        self.y = np.array([construct_board(s, is_solution=True) for s in solutions])        n = len(self.x)        self.steps_per_epoch = n // batch_size        super(SudokuIterator, self).__init__(n, batch_size, shuffle, seed)    def _get_batches_of_samples(self, index_array):        batch_x = self.x[index_array]        batch_y = self.y[index_array]        return batch_x, batch_y    def _get_batches_of_transformed_samples(self, index_array):        return self._get_batches_of_samples(index_array)  # Keras Iterator requirement    def next(self):        with self.lock:            index_array = next(self.index_generator)        return self._get_batches_of_samples(index_array)print('building model...')sudoku_model = build_model()print('done.')print('building generator...')train_gen = SudokuIterator(csv_path='sudoku10000.csv', batch_size=64, shuffle=True)print('done.')sudoku_model.fit_generator(train_gen, steps_per_epoch=train_gen.steps_per_epoch, epochs=50)sudoku_model.save('sudoku_model.h5')

 I was running this on a Jupyter notebook, and here are the training results:

building model...__________________________________________________________________________________________________Layer (type)                    Output Shape         Param #     Connected to                     ==================================================================================================input_1 (InputLayer)            (None, 9, 9, 10)     0                                            __________________________________________________________________________________________________conv2d_1 (Conv2D)               (None, 3, 3, 10)     910         input_1[0][0]                    __________________________________________________________________________________________________conv2d_2 (Conv2D)               (None, 1, 9, 10)     910         input_1[0][0]                    __________________________________________________________________________________________________conv2d_3 (Conv2D)               (None, 9, 1, 10)     910         input_1[0][0]                    __________________________________________________________________________________________________flatten_1 (Flatten)             (None, 90)           0           conv2d_1[0][0]                   __________________________________________________________________________________________________flatten_2 (Flatten)             (None, 90)           0           conv2d_2[0][0]                   __________________________________________________________________________________________________flatten_3 (Flatten)             (None, 90)           0           conv2d_3[0][0]                   __________________________________________________________________________________________________concatenate_1 (Concatenate)     (None, 270)          0           flatten_1[0][0]                                                                                   flatten_2[0][0]                                                                                   flatten_3[0][0]                  __________________________________________________________________________________________________dense_1 (Dense)                 (None, 729)          197559      concatenate_1[0][0]              __________________________________________________________________________________________________reshape_1 (Reshape)             (None, 9, 9, 9)      0           dense_1[0][0]                    __________________________________________________________________________________________________activation_1 (Activation)       (None, 9, 9, 9)      0           reshape_1[0][0]                  ==================================================================================================Total params: 200,289Trainable params: 200,289Non-trainable params: 0__________________________________________________________________________________________________done.building generator...done.Epoch 1/50156/156 [==============================] - 6s 39ms/step - loss: 2.1754 - acc: 0.1548Epoch 2/50156/156 [==============================] - 5s 33ms/step - loss: 1.9703 - acc: 0.3388Epoch 3/50156/156 [==============================] - 5s 34ms/step - loss: 1.6940 - acc: 0.4593Epoch 4/50156/156 [==============================] - 6s 37ms/step - loss: 1.5140 - acc: 0.5098...Epoch 50/50156/156 [==============================] - 6s 38ms/step - loss: 0.9684 - acc: 0.6565

 We executed this on a Jupyter notebook, and the result of the training is as follows:

acc: 0.6565

           This means that the model has an accuracy of about 65% on the training data. Of course, since we kept it very simple this time, there is a high possibility of overfitting. We wrote a script that uses this model to solve Sudoku puzzles.

import sysimport numpy as npfrom keras.models import load_modeldef construct_board(s, is_solution=False):    if is_solution:        board = np.zeros((9, 9, 9), dtype=np.float32)    else:        board = np.zeros((9, 9, 10), dtype=np.float32)  # the 10th dim is for empty grid    for i, ss in enumerate(s):        board[i // 9, i % 9, int(ss) - 1] = 1.0   # map 1 to 0th, 2 to 1th, ..., 9 to 8th, and 0 to -1th    return boardsudoku_model = load_model('sudoku_model.h5')answer = sudoku_model.predict(np.array([construct_board(sys.argv[1])]))for r in range(9):    row = ''    for c in range(9):        row += str(answer[0][r,c].argmax() + 1)    print(row)

 I will actually run the code now:

$>time python ml_solver.py 000070605000004080200006070073000200500000004004000960080400003050200000706080000Using TensorFlow backend.139878645667344182468356779673868255565827814514827968181485753352266796736283542python ml_solver.py   3.24s user 0.69s system 64% cpu 6.044 total

 

            As you can see from the results, the Sudoku constraints are clearly not satisfied... In other words, the problem could not be solved. Well, this time the training data is small and we are doing it simply, so I think it's okay if you get a rough idea. By the way, the execution time is 3.24 seconds, but most of it is spent loading the model. If you just ask the loaded model for the solution and display it, it only takes 0.004 seconds.

3. Constraint programming using Pulp

We will use a package called Pulp to define the problem as an optimization problem and solve it with the default CBC solver.

# -*- coding: utf-8 -*-import argparsefrom pulp import LpVariable, LpInteger, LpProblem, LpMinimize, LpStatus, lpSum, valuedef main(inp):    n = 9    b = 3    digits = [str(d + 1) for d in range(n)]    values = rows = columns = digits    answers = []    choices = LpVariable.dicts("Choice", (values, rows, columns), 0, 1, LpInteger)    boxes = [[(rows[b * i + k], columns[b * j + l]) for k in range(b) for l in range(b)] for j in range(b) for i in range(b)]    # 問題提議    problem = LpProblem("Solving Sudoku", LpMinimize)  # MinimizeでもMaximizeでもOK    problem += 0, "Arbitrary Objective Function"    # 制約追加    for r in rows:        for c in columns:            problem += lpSum([choices[v][r][c] for v in values]) == 1, ""    for v in values:        for r in rows:            problem += lpSum([choices[v][r][c] for c in columns]) == 1, ""        for c in columns:            problem += lpSum([choices[v][r][c] for r in rows]) == 1, ""        for b in boxes:            problem += lpSum([choices[v][r][c] for (r, c) in b]) == 1, ""    for i in range(n**2):        val = inp[i]        if val != '0':            problem += choices[str(val)][str(i/n + 1)][str(i % n + 1)] == 1, ""    while True:        # cbcソルバー利用        problem.solve()        if LpStatus[problem.status] == "Optimal":            answers.append(''.join([v for r in rows for c in columns for v in values if value(choices[v][r][c]) == 1]))            # 見つけた解を制約として追加            problem += lpSum(                [choices[v][r][c] for v in values for r in rows for c in columns if value(choices[v][r][c]) == 1]            ) <= 80        else:            break    if answers:        # 最初の解だけ表示        print(answers[0])if __name__ == '__main__':    parser = argparse.ArgumentParser()    parser.add_argument('inp', type=str)    args = parser.parse_args()    main(args.inp)

Defining the problem can be a bit tedious, right? Let's try running it.

$>time python pulp_solver.py 000070605000004080200006070073000200500000004004000960080400003050200000706080000438172695617954382295836471173649258569728134824513967982465713351297846746381529python solver.py   0.23s user 0.08s system 57% cpu 0.538 total

That's quite fast.
 

4. Implementation of Dancing Links to solve Exact Cover Problem.

            Dancing links is an implementation of Algorithm X proposed by Knuth, who came up with the idea of solving the Exact cover problem. In this implementation, the board is represented using a doubly linked circular list, and calculations are skipped for places where there are no numbers, making it fast. In the Python implementation below, a dictionary is used instead of a doubly linked circular list, but the access pattern is the same, and the code is just simpler and shorter.

import sysfrom itertools import productdef solve_sudoku(grid):    b = 3    n = b * b    x = ([("rc", rc) for rc in product(range(n), range(n))] +         [("rn", rn) for rn in product(range(n), range(1, n + 1))] +         [("cn", cn) for cn in product(range(n), range(1, n + 1))] +         [("bn", bn) for bn in product(range(n), range(1, n + 1))])    y = dict()    for r, c, n in product(range(n), range(n), range(1, n + 1)):        box_number = (r // b) * b + (c // b)        y[(r, c, n)] = [            ("rc", (r, c)),            ("rn", (r, n)),            ("cn", (c, n)),            ("bn", (box_number, n))]    x, y = exact_cover(x, y)    for i, row in enumerate(grid):        for j, n in enumerate(row):            if n:                select(x, y, (i, j, n))    for solution in solve(x, y, []):        for (r, c, n) in solution:            grid[r][c] = n        yield griddef exact_cover(x, y):    x = {j: set() for j in x}    for i, row in y.items():        for j in row:            x[j].add(i)    return x, ydef solve(x, y, solution):    if not x:        yield list(solution)    else:        c = min(x, key=lambda c: len(x[c]))        for r in list(x[c]):            solution.append(r)            cols = select(x, y, r)            for s in solve(x, y, solution):                yield s            deselect(x, y, r, cols)            solution.pop()def select(x, y, r):    cols = []    for j in y[r]:        for i in x[j]:            for k in y[i]:                if k != j:                    x[k].remove(i)        cols.append(x.pop(j))    return colsdef deselect(x, y, r, cols):    for j in reversed(y[r]):        x[j] = cols.pop()        for i in x[j]:            for k in y[i]:                if k != j:                    x[k].add(i)if __name__ == "__main__":    n = 9    q = sys.argv[1]    p = [map(int, list(q[i*n:(i+1)*n])) for i in range(n)]    for solution in solve_sudoku(p):        print(''.join([''.join(map(str, r)) for r in solution]))

It's one of the strengths of Python that you can write an implementation of such a complex algorithm in about 80 lines of code like this. Let's run it.

$>time python dancing_links_sudoku.py 000070605000004080200006070073000200500000004004000960080400003050200000706080000438172695617954382295836471173649258569728134824513967982465713351297846746381529python dancing_links_sudoku.py   0.04s user 0.02s system 66% cpu 0.084 total

It's faster than using Pulp.

Summary

 

          This article introduced various methods to solve Sudoku puzzles, and provided Python implementation examples for each. While non-stochastic methods can always find a solution, stochastic methods cannot guarantee a solution but offer the advantage of shorter execution times if a solution is found. Therefore, combining multiple algorithms such as using a learned model and solving as an Exact cover problem can reduce the total time required to solve many problems. This approach can be applied not only to Sudoku but also to various other problems.

Sudoku is a game originally intended for mental exercise, and some people may question the purpose of solving it with machines. However, I assure you that I also exercised my brain to allow machines to solve it.

 

References