Algorithm for Solving Sudoku Puzzles and Implementation

3/26/2023

Tech Blog

Algorithm for Solving Sudoku Puzzles and Implementation

By Chinbat Chindegsuren

Picture created by : DALL-E image-generating deep-learning model

How to solve Sudoku puzzles

There are various ways to solve Sudoku puzzles. You can solve them using your eyes, brain, and hands, use online tools, or ask someone who knows the solution. However, this time, I would like to introduce a simple way to solve them using programming.

1. Backtracking

In essence, it is a typical depth-first search. That is, we repeat and check all possible combinations. Of course, we can speed up the process by using techniques such as pruning. There are about 9^81 (approximately 2×10^77) possible combinations to place numbers 1 to 9 on a 9x9 field, and out of these, about 6.67×10^21 (exactly 6,670,903,752,021,072,936,960) combinations could fill the Sudoku correctly. It is still an enormous number, so this method of checking combinations can be slower compared to other methods. However, the advantage is that if a solution exists, it will always be found.

Visionary :

2. Stochastic search

Another way to solve Sudoku puzzles is probabilistic solving, which converges quickly but does not guarantee finding a solution. For example, you can:

Fill in the empty cells with random numbers.
Calculate the number of errors (e.g., the number of conflicts with the Sudoku rules).
Shuffle the numbers to reduce the number of errors, and repeat this process.

Genetic algorithms are sometimes used as the shuffling algorithm.

Recently, solving Sudoku puzzles using deep learning, which can be classified as "stochastic search," has become popular. Ultimately, it is looking for parameters that minimize the error using gradient descent.

3. Constraint programming

Another way to solve Sudoku puzzles is to define them as constraint satisfaction problems, which can be solved using a generic solver for optimization. With this method, if a solution exists, it will always be found.

4. Exact cover

This method can be classified as constraint programming, but the algorithm is quite interesting, so it is separated as a separate topic. In simple terms, it is an NP-complete problem to find a set of subsets that can cover a given set without overlapping or missing elements. Algorithms such as Algorithm X, proposed by Donald Knuth, can be used to find all solutions to the exact cover problem. Therefore, if the Sudoku problem can be defined as an exact cover problem, this method can be used to solve it.

Implementation

I'll show implementation examples in Python for each solving method. Regarding the implementation, of course, I did not write everything from scratch, and I also referred to code written by others.

Regardless of the implementation method, the following sample problem will be used as a common sample problem:

By the way, according to my Sudoku-loving wife, this problem doesn't seem to be that difficult. She solved it in 11 minutes.

1. Backtracking of short coding

def solve_puzzle(grid):
    size = 9
    box_size = 3
    
    # Find first empty cell (marked as '0')
    empty_pos = grid.find("0")
    
    # Exit if no empty position found
    if empty_pos == -1:
        exit(grid)
    
    # Try digits 1-9 in the empty position
    for digit in map(str, range(1, size + 1)):
        # Check if digit is valid in current position
        is_valid = all(
            (empty_pos - j) % size * (empty_pos // size ^ j // size) * 
            (empty_pos // (size * box_size) ^ j // (size * box_size) | 
             empty_pos % size // box_size ^ j % size // box_size) 
            or grid[j]
            for j in range(size**2)
        )
        
        if is_valid:
            # Recursively try to solve the rest of the puzzle
            solve_puzzle(grid[:empty_pos] + digit + grid[empty_pos + 1:])

# Entry point
if __name__ == "__main__":
    from sys import argv
    solve_puzzle(argv[1])

It's done. Let's run it.

$> time python short_solver.py 000070605000004080200006070073000200500000004004000960080400003050200000706080000438172695617954382295836471173649258569728134824513967982465713351297846746381529python short_solver.py   9.66s user 0.03s system 98% cpu 9.807 total

The problem is represented by writing the numbers as digits from the upper left to the lower right, and filling in the empty spaces with 0. By the way, the code is intentionally made short, but if you expand it, you can see that the worst-case time complexity is O(n^4) for this dull Backtracking algorithm.

2. Stochastic search: Challeging to solve Sudoku problems with machine learning

As a web engineer, I'm not particularly knowledgeable about machine learning, so I asked for help from my colleague who is a machine learning engineer. It's great to have people around who you can ask for help. Here's the process we followed for implementation:

Data preparation: There were 1 million Sudoku problems and solutions on Kaggle, so we extracted about 10,000 of them. Since we weren't particularly focused on accuracy for this implementation, we worked with a smaller dataset. Model creation: We created a very simple CNN model with only one layer, which is called shallow learning. Execution: We fed the model with the sample problem and obtained the result. One characteristic of Stochastic search is that there is no guarantee that the result will always be a solution. Will the solution be found?

import csv
import numpy as np
from keras.models import Model, load_model
from keras.layers import Conv2D, Input, Concatenate, Flatten, Reshape, Dense, Activation
from keras.utils import plot_model
from keras.preprocessing.image import Iterator

def build_model():
    inp = Input((9, 9, 10))
    x1 = Conv2D(filters=10, kernel_size=(3, 3), strides=(3, 3), 
                padding='valid', activation='relu')(inp)
    x2 = Conv2D(filters=10, kernel_size=(9, 1), 
                padding='valid', activation='relu')(inp)
    x3 = Conv2D(filters=10, kernel_size=(1, 9), 
                padding='valid', activation='relu')(inp)
    
    x1 = Flatten()(x1)
    x2 = Flatten()(x2)
    x3 = Flatten()(x3)
    
    x = Concatenate()([x1, x2, x3])
    x = Dense(9 * 9 * 9)(x)
    x = Reshape((9, 9, 9))(x)
    x = Activation('softmax')(x)
    
    model = Model(inp, x)
    model.compile(optimizer='adam', 
                 loss='categorical_crossentropy', 
                 metrics=['accuracy'])
    model.summary()
    return model

def read_csv(path):
    with open(path) as f:
        reader = csv.reader(f)
        next(reader)  # first row is header
        quizes = []
        solutions = []
        for quiz, solution in reader:
            quizes.append(quiz)
            solutions.append(solution)
    return quizes, solutions

def construct_board(s, is_solution=False):
    if is_solution:
        board = np.zeros((9, 9, 9), dtype=np.float32)
    else:
        # the 10th dim is for empty grid
        board = np.zeros((9, 9, 10), dtype=np.float32)
    
    for i, ss in enumerate(s):
        # map 1 to 0th, 2 to 1th, ..., 9 to 8th, and 0 to -1th
        board[i // 9, i % 9, int(ss) - 1] = 1.0
    return board

class SudokuIterator(Iterator):
    def __init__(self, csv_path, batch_size, shuffle=False, seed=0):
        quizes, solutions = read_csv(csv_path)
        self.x = np.array([construct_board(q) for q in quizes])
        self.y = np.array([construct_board(s, is_solution=True) for s in solutions])
        n = len(self.x)
        self.steps_per_epoch = n // batch_size
        super(SudokuIterator, self).__init__(n, batch_size, shuffle, seed)

    def _get_batches_of_samples(self, index_array):
        batch_x = self.x[index_array]
        batch_y = self.y[index_array]
        return batch_x, batch_y

    def _get_batches_of_transformed_samples(self, index_array):
        # Keras Iterator requirement
        return self._get_batches_of_samples(index_array)

    def next(self):
        with self.lock:
            index_array = next(self.index_generator)
        return self._get_batches_of_samples(index_array)

if __name__ == "__main__":
    print('building model...')
    sudoku_model = build_model()
    print('done.')

    print('building generator...')
    train_gen = SudokuIterator(csv_path='sudoku10000.csv', 
                              batch_size=64, 
                              shuffle=True)
    print('done.')

    sudoku_model.fit_generator(
        train_gen,
        steps_per_epoch=train_gen.steps_per_epoch,
        epochs=50
    )

    sudoku_model.save('sudoku_model.h5')

I was running this on a Jupyter notebook, and here are the training results:

building model...__________________________________________________________________________________________________Layer (type)                    Output Shape         Param #     Connected to                     ==================================================================================================input_1 (InputLayer)            (None, 9, 9, 10)     0                                            __________________________________________________________________________________________________conv2d_1 (Conv2D)               (None, 3, 3, 10)     910         input_1[0][0]                    __________________________________________________________________________________________________conv2d_2 (Conv2D)               (None, 1, 9, 10)     910         input_1[0][0]                    __________________________________________________________________________________________________conv2d_3 (Conv2D)               (None, 9, 1, 10)     910         input_1[0][0]                    __________________________________________________________________________________________________flatten_1 (Flatten)             (None, 90)           0           conv2d_1[0][0]                   __________________________________________________________________________________________________flatten_2 (Flatten)             (None, 90)           0           conv2d_2[0][0]                   __________________________________________________________________________________________________flatten_3 (Flatten)             (None, 90)           0           conv2d_3[0][0]                   __________________________________________________________________________________________________concatenate_1 (Concatenate)     (None, 270)          0           flatten_1[0][0]                                                                                   flatten_2[0][0]                                                                                   flatten_3[0][0]                  __________________________________________________________________________________________________dense_1 (Dense)                 (None, 729)          197559      concatenate_1[0][0]              __________________________________________________________________________________________________reshape_1 (Reshape)             (None, 9, 9, 9)      0           dense_1[0][0]                    __________________________________________________________________________________________________activation_1 (Activation)       (None, 9, 9, 9)      0           reshape_1[0][0]                  ==================================================================================================Total params: 200,289Trainable params: 200,289Non-trainable params: 0__________________________________________________________________________________________________done.building generator...done.Epoch 1/50156/156 [==============================] - 6s 39ms/step - loss: 2.1754 - acc: 0.1548Epoch 2/50156/156 [==============================] - 5s 33ms/step - loss: 1.9703 - acc: 0.3388Epoch 3/50156/156 [==============================] - 5s 34ms/step - loss: 1.6940 - acc: 0.4593Epoch 4/50156/156 [==============================] - 6s 37ms/step - loss: 1.5140 - acc: 0.5098...Epoch 50/50156/156 [==============================] - 6s 38ms/step - loss: 0.9684 - acc: 0.6565

We executed this on a Jupyter notebook, and the result of the training is as follows:

acc: 0.6565

This means that the model has an accuracy of about 65% on the training data. Of course, since we kept it very simple this time, there is a high possibility of overfitting. We wrote a script that uses this model to solve Sudoku puzzles.

import sys
import numpy as np
from keras.models import load_model

def construct_board(s, is_solution=False):
    if is_solution:
        board = np.zeros((9, 9, 9), dtype=np.float32)
    else:
        # the 10th dim is for empty grid
        board = np.zeros((9, 9, 10), dtype=np.float32)
    
    for i, ss in enumerate(s):
        # map 1 to 0th, 2 to 1th, ..., 9 to 8th, and 0 to -1th
        board[i // 9, i % 9, int(ss) - 1] = 1.0
    return board

def print_solution(answer):
    for r in range(9):
        row = ''
        for c in range(9):
            row += str(answer[0][r, c].argmax() + 1)
        print(row)

if __name__ == "__main__":
    # Load the trained model
    sudoku_model = load_model('sudoku_model.h5')
    
    # Predict solution for input puzzle
    input_puzzle = sys.argv[1]
    board = construct_board(input_puzzle)
    answer = sudoku_model.predict(np.array([board]))
    
    # Print the solution
    print_solution(answer)

I will actually run the code now:

$>time python ml_solver.py 000070605000004080200006070073000200500000004004000960080400003050200000706080000Using TensorFlow backend.139878645667344182468356779673868255565827814514827968181485753352266796736283542python ml_solver.py   3.24s user 0.69s system 64% cpu 6.044 total

As you can see from the results, the Sudoku constraints are clearly not satisfied... In other words, the problem could not be solved. Well, this time the training data is small and we are doing it simply, so I think it's okay if you get a rough idea. By the way, the execution time is 3.24 seconds, but most of it is spent loading the model. If you just ask the loaded model for the solution and display it, it only takes 0.004 seconds.

3. Constraint programming using Pulp

We will use a package called Pulp to define the problem as an optimization problem and solve it with the default CBC solver.

# -*- coding: utf-8 -*-
import argparse
from pulp import (
    LpVariable, LpInteger, LpProblem, LpMinimize,
    LpStatus, lpSum, value
)

def main(inp):
    n = 9
    b = 3
    digits = [str(d + 1) for d in range(n)]
    values = rows = columns = digits
    answers = []
    
    # Create variables
    choices = LpVariable.dicts("Choice", 
                             (values, rows, columns), 
                             0, 1, LpInteger)
    
    # Define boxes for 3x3 constraints
    boxes = [
        [(rows[b * i + k], columns[b * j + l]) 
         for k in range(b) for l in range(b)]
        for j in range(b) for i in range(b)
    ]
    
    # Initialize problem
    problem = LpProblem("Solving Sudoku", LpMinimize)
    problem += 0, "Arbitrary Objective Function"
    
    # Add constraints
    # Each cell must have exactly one value
    for r in rows:
        for c in columns:
            problem += lpSum([choices[v][r][c] for v in values]) == 1, ""
    
    # Each value must appear once in each row
    for v in values:
        for r in rows:
            problem += lpSum([choices[v][r][c] for c in columns]) == 1, ""
        
        # Each value must appear once in each column
        for c in columns:
            problem += lpSum([choices[v][r][c] for r in rows]) == 1, ""
        
        # Each value must appear once in each box
        for b in boxes:
            problem += lpSum([choices[v][r][c] for (r, c) in b]) == 1, ""
    
    # Add initial values from input
    for i in range(n**2):
        val = inp[i]
        if val != '0':
            problem += choices[str(val)][str(i//n + 1)][str(i % n + 1)] == 1, ""
    
    # Solve the problem and find all solutions
    while True:
        # Use CBC solver
        problem.solve()
        
        if LpStatus[problem.status] == "Optimal":
            # Collect solution
            solution = ''.join([
                v for r in rows 
                for c in columns 
                for v in values 
                if value(choices[v][r][c]) == 1
            ])
            answers.append(solution)
            
            # Add constraint to find different solution
            problem += lpSum([
                choices[v][r][c] 
                for v in values 
                for r in rows 
                for c in columns 
                if value(choices[v][r][c]) == 1
            ]) <= 80
        else:
            break
    
    # Print first solution if found
    if answers:
        print(answers[0])

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('inp', type=str)
    args = parser.parse_args()
    main(args.inp)

Defining the problem can be a bit tedious, right? Let's try running it.

$>time python pulp_solver.py 000070605000004080200006070073000200500000004004000960080400003050200000706080000438172695617954382295836471173649258569728134824513967982465713351297846746381529python solver.py   0.23s user 0.08s system 57% cpu 0.538 total

That's quite fast.

4. Implementation of Dancing Links to solve Exact Cover Problem.

Dancing links is an implementation of Algorithm X proposed by Knuth, who came up with the idea of solving the Exact cover problem. In this implementation, the board is represented using a doubly linked circular list, and calculations are skipped for places where there are no numbers, making it fast. In the Python implementation below, a dictionary is used instead of a doubly linked circular list, but the access pattern is the same, and the code is just simpler and shorter.

import sys
from itertools import product

def solve_sudoku(grid):
    b = 3  # Size of small box
    n = b * b  # Size of grid
    
    # Create exact cover problem
    x = (
        [("rc", rc) for rc in product(range(n), range(n))] +
        [("rn", rn) for rn in product(range(n), range(1, n + 1))] +
        [("cn", cn) for cn in product(range(n), range(1, n + 1))] +
        [("bn", bn) for bn in product(range(n), range(1, n + 1))]
    )
    
    # Create constraints dictionary
    y = dict()
    for r, c, n in product(range(n), range(n), range(1, n + 1)):
        box_number = (r // b) * b + (c // b)
        y[(r, c, n)] = [
            ("rc", (r, c)),
            ("rn", (r, n)),
            ("cn", (c, n)),
            ("bn", (box_number, n))
        ]
    
    # Convert to exact cover format
    x, y = exact_cover(x, y)
    
    # Add initial values as constraints
    for i, row in enumerate(grid):
        for j, n in enumerate(row):
            if n:
                select(x, y, (i, j, n))
    
    # Find and yield solutions
    for solution in solve(x, y, []):
        for (r, c, n) in solution:
            grid[r][c] = n
        yield grid

def exact_cover(x, y):
    x = {j: set() for j in x}
    for i, row in y.items():
        for j in row:
            x[j].add(i)
    return x, y

def solve(x, y, solution):
    if not x:
        yield list(solution)
    else:
        # Choose column with minimum possibilities
        c = min(x, key=lambda c: len(x[c]))
        for r in list(x[c]):
            solution.append(r)
            cols = select(x, y, r)
            for s in solve(x, y, solution):
                yield s
            deselect(x, y, r, cols)
            solution.pop()

def select(x, y, r):
    cols = []
    for j in y[r]:
        for i in x[j]:
            for k in y[i]:
                if k != j:
                    x[k].remove(i)
        cols.append(x.pop(j))
    return cols

def deselect(x, y, r, cols):
    for j in reversed(y[r]):
        x[j] = cols.pop()
        for i in x[j]:
            for k in y[i]:
                if k != j:
                    x[k].add(i)

if __name__ == "__main__":
    n = 9
    puzzle = sys.argv[1]
    
    # Convert input string to 2D grid
    grid = [
        list(map(int, list(puzzle[i*n:(i+1)*n]))) 
        for i in range(n)
    ]
    
    # Solve and print solution
    for solution in solve_sudoku(grid):
        print(''.join([''.join(map(str, r)) for r in solution]))

It's one of the strengths of Python that you can write an implementation of such a complex algorithm in about 80 lines of code like this. Let's run it.

$>time python dancing_links_sudoku.py 000070605000004080200006070073000200500000004004000960080400003050200000706080000438172695617954382295836471173649258569728134824513967982465713351297846746381529python dancing_links_sudoku.py   0.04s user 0.02s system 66% cpu 0.084 total

It's faster than using Pulp.

Summary

This article introduced various methods to solve Sudoku puzzles, and provided Python implementation examples for each. While non-stochastic methods can always find a solution, stochastic methods cannot guarantee a solution but offer the advantage of shorter execution times if a solution is found. Therefore, combining multiple algorithms such as using a learned model and solving as an Exact cover problem can reduce the total time required to solve many problems. This approach can be applied not only to Sudoku but also to various other problems.

Sudoku is a game originally intended for mental exercise, and some people may question the purpose of solving it with machines. However, I assure you that I also exercised my brain to allow machines to solve it.

Algorithm for Solving Sudoku Puzzles and Implementation

How to solve Sudoku puzzles

1. Backtracking

2. Stochastic search

3. Constraint programming

4. Exact cover

Implementation

1. Backtracking of short coding

2. Stochastic search: Challeging to solve Sudoku problems with machine learning

3. Constraint programming using Pulp

4. Implementation of Dancing Links to solve Exact Cover Problem.

Summary

References

Хэрэглэгчийн зөвшөөрөл (Authorization)-г RBAC аргаар зохиомжлох нь

Google Cloud Vision API ашиглан зургаас текстийг нуух

A quest to develop a 2D game from scratch, with 0 knowledge of Game Development