Wednesday, July 29, 2009

Sudoku Solver Improvement

So the sudoku solver. It was clear that a simple brute force solution solved the problem reasonably fast. I had overestimated the difficulty of the problem. Still it would be fun to try to improve the solver. I could make smarter game analyzers.

Now clearly there are a lot of bad guesses made by my previous analyzer. The number of moves varied from 68 in the case of the simple puzzle to 61125. It looked like making the solver smarter might help make it faster too. So it's time to understand the problem a little better.

We start with the base puzzle and then we start putting numbers into empty cells. For example with the given puzzle we could start with the topmost leftmost empty cell and choose to put in 2,5,6 or 8. Only one of these is the correct solution but all of these are legal moves to make over here. Alternately instead of first working on that cell we could work on the cell next to it which can take 2,4,5 or 8. So starting from the base puzzle we can make many many different moves. There are 58 empty cells, so we could make 58 different sets of moves ... and each set potentially has 4 or 5 candidates for numbers to put into the cell. Since any sudoku puzzle has only one answer (meaning there is only one number that can go into each of the empty cells), 58 of the moves we can make at the beginning will all lead us to the right answer. Several of these will take us to the wrong answer (which we will realise since there will be cells into which no number can be legally entered).

Take a look at the adjoining figure. Now let's say the black circle represents the puzzle itself. From there we can make 281 different legal moves in the case of the puzzle describe above. Let's say red circles represent legal moves made to the topmost leftmost cell. We know there are 4 of these we have depicted 5 and 6. Similarly let green represent legal moves made to the adjoining cell. Again there are 4 possible values and we have depicted 5 and 8. From each of the 281 circles, we'd be able to make more moves and so on. As long as we're making correct moves, after 58 moves we will complete the puzzle. So what we have is a directed graph structure which starts of with one node, but has 281 nodes at the top. As we keep doing down, the breadth will rapdily diminish. The depth of the graph is 58 (since there are 58 empty cells, it takes 58 moves to fill the puzzle). Also since any cell can be filled in last, just before we get to the solution, there will be 58 possible game states from which we can reach the solution. The breadth of this graph is always greater than the depth.

Red 5

Green 6

So here's a graph where after the question node, the graph is 281 cells wide, gets wider for a while and then starts shrinking. Shrinks to 58 and then to 1 cell. Other important aspects of this graph are
  • There are nodes from where you can never get to another node which is one level below. (This should be obvious. From red 5 we can never get to any node where the green cell has 5 filled in).
  • However there are many different routes to the correct solution or to pretty much any step along the way.
  • There are nodes where for certain cells there would be no legal moves. For instance the image on the right does not allow any legal move for the cell in the 2nd row 3rd column.

This is a very large graph and is quite unwieldy to traverse if we wish to find the solution. Now when the first solver was written, the strategy adopted already reduces this structure quite a bit.

Instead of focusing on all possible moves at any point in time, the focus was only on all possible values for the first empty cell(where cells were ordered from top to bottom and left to right). This is a much smaller subset of the directed graph. It looks more like the diagramn on the right. In fact it's a tree. Once a certain value is in a particular cell, no other value can also be present in the same cell. So unlike in the previous structure, each branch is unique. Only one branch leads to the answer, the others lead to dead ends where there are no legal moves.

So now that we know what the solution space looks like, we can think of a way to improve the solver. Clearly we've chosen to perform a depth first search. Often while performing searches on graphs and trees, breadth first searches are more efficient. However, that is not true in this case. As we keep making moves, there is no way for us to discover that we are indeed on the right track. (which is where a breadth first search most benefits. Either you hope to stumble upon the solution or else you hope to stumble upon evidence that you are on the right branch.) .Instead , here we are more likely to find evidence that a specific branch is a dead end and then move on.

So here's something we can try to speed up the solver. How about, we try to detect early, if a particular branch is going to lead to a dead end. This will save us a lot of time moving forward only to discover that the branch was a dead end and then rolling back all the way. So far our strategy has been, pick up an empty cell, guess what numbers could be in there ... and for each such number, start guessing what the next empty cell might hold. So the next step is to introduce a new step in between these two. After guessing what numbers might be in a cell, for each such number let's make sure that no other cell ends up with 0 legal moves. Only when this is true, will we pick up the next empty cell. If we're right we should end up with far fewer guesses and backtracks. And hopefully we find the solution faster.

So let's see how this solver does.

Solved hard with 4938 moves and 4884 rollbacks in 5.300446 seconds
Solved evil with 20151 moves and 20095 rollbacks in 19.704105 seconds
Solved escargot with 5421 moves and 5363 rollbacks in 5.690688 seconds
Solved simple with 52 moves and 7 rollbacks in 0.045307 seconds
Solved other_hard with 5640 moves and 5579 rollbacks in 6.978094 seconds
Analyzer::ConstraintsChecking ran 5 puzzles in 37.719066 seconds with 36202 moves and 35928 rollbacks

Clearly the number of moves has reduced considerably. From 52 moves to 20151 in the worst case. A three-fold improvement. But the speed hasn't improved at all. If anything it has become worse. From 5.52 seconds to 37.71 seconds. While this analyzer is smarter, it looks like that smartness came at a price. Taking the time each step to make sure that no other cell is invalid, slows down the solver wayyyy too much.

Time for a change in direction. Meanwhile, here's the code.

module Analyzer
class ConstraintsChecking < Base
def initialize(game, analyzer=nil)
@possible_values = game.empty_cells.tail.any?{|cell| possible_values(game, cell).empty?} ? [] : possible_values(game, game.empty_cells.first)
@cell = game.empty_cells.first

1 comment:

  1. Sudoku, simply defined, is a puzzle based on numbers and logic. The game requires filling the vacant boxes of a 9X9 grid with single digit numbers so that – on completion - each column, each row.