Sunday, February 14, 2010

Sudoku solver, alternate optimization strategy, final

I had written about optimising the sudoku solver by remembering moves made in the past. Another strategy that I had tried parallely was to do a short-circuit evaluation. Each iteration of the solver is about searching all the empty cells to find the list of acceptable moves and then choosing the cell with the least moves. However, clearly a cell with zero options means that it's time to backtrack. Clearly instead of continuing to search the list of cells, that's a place to jump out of a search. Similarly a cell with one option means that there is only one option so I might as well make that move.

I experimented with a short circuited search where I jumped out the minute I found something with some "n" moves and I varied "n" from  upwards. It turned out the best option was to jump out when I had zero or one option. Sadly this optimisation only gives me a factor of 3 improvement, and doesn't stack much with the previous one. 

Anyway the whole code is available at github -> http://github.com/pathsny/Sudoku-Solver

Tuesday, February 9, 2010

... Okay 3, 2, 1 Let's Jam. Quicksilver, Ruby and Mac OSX

QuickSilver is an application launcher for Mac OS that is very popular productivity enhancer. It's extensible through a ton of plugins. But it also allows people to launch custom scripts, which allow people to quickly script any action they commonly perform. I first saw the possibilities when I read this article. Some of those scripts were quite useless to me, but there are a few I end up using all the time. Like shortcuts to turn my wireless radio off when Im not online and wanna save power. So this is about 3 QuickSilver scripts I recently wrote

Initially I wrote a few of them in applescript, but when I wanted to write the script to "automatically pastie my code", I decided it would be easier to use a language I'm more comfortable with. I found a ruby-applescript bridge, so I decided to go with ruby. The bridge is obtained by installing the rb-appscript gem. You can find on these scripts on Github at http://github.com/pathsny/QuickSilver

I also attempted a "maximise" script based on this article. Sadly it just doesn't work :/. It's always incredibly slow the first time you launch it.

Sunday, January 10, 2010

Programming Async code in a Sync style using Laziness and Functional patterns

The first time I came across some common idioms of functional programming happened when I was working with .NET and came across the List methods. The findAll and ForEach methods. For me it was a revelation. I kept rewriting all the code on the project I was on, using these idioms.

Since functional languages were popular at that point of time and I discovered that such languages were where this idea had sprung from,, I decided to start learning Haskell. I'm still learning it, but one interesting thing about Haskell was that it is the Laziest functional language. Everything was automatically lazy and strictness had to be enforced somewhere when you really needed. Like say in the IO monad where you say when you want to print something, you mean like right now! So if I had to print the first 5 even natural numbers greater than some x, I could write this. (Warning: horribly contrived example and code).
 f x = take 5 $ dropWhile (<x) $filter even [1..] 
Notice the infinite list of natural numbers there? The reason this works is because Haskell is lazy. What really happens here is that when I print the output of that, the print function asks take 5 to return output. take 5 asks dropWhile to return 5 items. dropWhile asks filter to return items until it finds an item that is greater than x and then asks for the next and then filter asks the infinite source for even items and keeps doing so until dropWhile is satisfied. The realization here is that on the left side we have a source, on the right side is a sink. Everything else is a component meant to build a pipeline. Something about the system knows that this system's flow is governed by the sink, and so the pipeline components all pull output until the sink is satisfied. The idea is that I can write more idiomatic code and use things like infinite lists when I want to. .NET had done the same thing with the IEnumerable stuff from 3.5. So I could now write
InfiniteList.FindAll(Even).SkipWhile(y => y > x).Take(5)
I recently read about another realization. Another common pattern while programming is an event based system. Like a web application waiting for requests from someone. A Javascript function that makes an ajax call and displays the output on the screen and so on. Specifically, an async system where you wait for events to pour in and for each event you go on to perform some actions.

So imagine I had a system where I go on to call some number generator which dials up a service halfway across the world. the service spews out natural numbers. I want to register a callback function which waits for numbers, ignores the odd ones. Maintains a flag so it knows at least one such number has been greater than x and then gives me the next 5 numbers before swallowing them. (yeah I know, incredibly useful). Since this deals with state, let's do this in some imperative ruby style language. Typically I might have to say

def handler(n)
return if n % 2 == 1
@skip = false if n > x && @skip
return if @skip
@num_returned = 0 unless @num_returned
return if @num_returned > 5
@num_returned += 1
puts n
end

numbercaller(handler)

The beautiful functional style code has morphed to something so ugly all because I switched from pull to push async. But it's not hard to recognize that this problem is very very similar to the one I was previously solving. All that changed was the source here controls when stuff goes through the pipeline instead of the sink. So with the right tools I should be able to use the same pipeline tools as before and have the language features just switch the flow control around. Numbercaller should present the same interface as an iterable system, allowing me to use the same pipeline components, but should push stuff through the pipeline instead. I should be able to use not just each, but also map, dropWhile, take and so on. It's just a different take on Laziness Or as the article I'm going to link you to put it, the Observable pattern and the Iterable pattern are virtually the same thing. You can read all about it here. The Rx (Linq to Events).

Truth be told, I'm a little nervous about this pattern. It's a new abstraction and like all abstractions it has the potential to be leaky. Imagine we managed to implement this on some Javascript ajaxy interface. On the click of some item I want to make an ajax call. I want the result of the Ajax call to update some div.

button.onClick = function(){
AjaxUpdater.update(url,function(response){
html = process(response)
$("blah").innerHtml = html
})}
doSomethingElse();

Now with the new realization we have we'd like to write.

responses = button.clickEvents.each(AjaxUpdater(url).map(process)).flatten
responses.each(function(html){$("blah").innerHtml = html})
doSomethingElse()
Now, the important point here is that the first two lines need to happen at some random time in the future, and in fact repeatedly each time I get a response. But doSomethingElse needs to happen right away without waiting for click events to occur. Since the abstraction hides the fact that this event appears at some time in the future and potentially infinite times, it might mean that I accidentally make some event occur with every click? It might mean buggier code, or some unfortunate performance implications. It might simply mean we need much smarter interpreters. The reactor pattern (non-blocking io). has become very popular of late. It involves converting a lot of operations that are traditionally treated as synchronous, such as writing to disk, or making a blocking call for data without which you cannot proceed, to asynchronous. It has some very interesting implications and provides considerable performance advantages. This pattern might make writing code with the reactor pattern as idiomatic as code which treats these operations as synchronous.

Monday, November 16, 2009

Sudoku Solver - Remembering The Past

The sudoku solver is fairly fast and seems to be doing all the right things. By choosing the cells with the least options the number of useless moves has been reduced considerably. However, there is a fair amount of rework thats currently going on. Every time the state of the game is analyzed, the entire board is scanned to find all possible numbers that can be stored in each empty cell. Once a single cell is filled a new iteration begins. Again, this analysis is repeated by scanning the whole board. But, human's don't do that!

Once I've decided what numbers are playable in each empty cell. I don't repeat this exercise everytime I fill up an empty cell. Most empty cells are not affected by any single cell being filled. If I fill a cell that used to be empty, the only empty cells that are affected are cells that shared a row, column or block with the filled cell. So between iterations, if the analysis of the game was to be saved, I could quickly obtain an analysis for the next iteration by slightly modifying the analysis of the previous iteration. OfCourse this is going to take considerably more code to do, but it should show significant improvements.

Now unfortunately I haven't saved the results of running all this on the machine I used when I first started blogging all this. So the times I paste will not be comparable but will have to be scaled a bit. So Ill repost the highlights of previous runs.

Analyzer::Simple ran 5 puzzles in 3.115519 seconds with 107442 moves and 107168 rollbacks
Analyzer::LeastChoicesFirst ran 5 puzzles in 0.412483 seconds with 631 moves and 357 rollbacks

So last time we saw an improvement of a factor of ten. And now, here is the result of the latest change.

Solved simple with 45 moves and 0 rollbacks in 0.035017 seconds
Solved hard with 83 moves and 29 rollbacks in 0.010338 seconds
Solved evil with 217 moves and 161 rollbacks in 0.024111 seconds
Solved escargot with 217 moves and 159 rollbacks in 0.023176 seconds
Solved other_hard with 69 moves and 8 rollbacks in 0.009236 seconds

Analyzer::StorageBasedLeast ran 5 puzzles in 0.102331 seconds with 631 moves and 357 rollbacks

That's a speed up of about 4 times. This is pretty good. As always the code follows. Notice that this time my analyzer actually has to expose read attributes so that the state can be preserved from one iteration to the next. This series of posts is nearly done. I have one more post which will follow where I'll also link to the entire code on github.



class StorageBasedLeast < Base
attr_reader :cells_with_values, :cell, :value

def initialize(game, analyzer=nil)
@cells_with_values = find_cells_with_values(game, analyzer)
@cell, @possible_values = cells_with_values.min{|pair_1, pair_2| pair_1[1].length <=> pair_2[1].length}
end


def each
@possible_values.each do |value|
@value = value
yield @cell, value
end
end

def find_cells_with_values(game, analyzer)
return game.empty_cells.collect{|cell| [cell, possible_values(game, cell)]} unless analyzer
cells_with_values = analyzer.cells_with_values.reject{|pair| pair[0] == analyzer.cell}
neighbours = game.neighbour_indices(analyzer.cell)
cells_with_values.collect do |pair|
values = neighbours.include?(pair[0]) ? pair[1] - [analyzer.value] : pair[1]
[pair[0],values]
end
end
end

Friday, November 13, 2009

Highlights that impressed me from the video on the Go language

Go is a new programming language developed by google and a lot of big names and you can find out all about it on it's wikipedia page here or on it's website here, or you could watch the hour long video here on youtube.

Anyway here are the highlights I got out of that video. I'll still need to read what's on the website to understand the language better, but it does raise my interest now. They wanted quick build times, better support for concurrency and a mixture of the advantages of static and dynamically typed languages. So here they are in no particular order.
  • Very fast compilation. Im guessing on large codebases this can make a big enough difference. It's interesting anyway.
  • Adding methods to anything. To investigate
    1. Does this mean I can add methods to any existing type?
    2. Does this mean I can add methods to an instance of a type?

  • Automatically implemented interfaces. This is kinda cool actually. An interface is declared as a bunch of behavior and anything which exhibits this behavior now implements that interface. Which kinda gives you a duck typing like thing. implements as a keyword does not even exist.
  • Unicode characters can be used in variables. π = 3.14159 is valid code.
  • Making code async is trivial. Declare a function and just type go <function name>. The function executes in an async manner. Apparently whatever threading like thing they've used is very efficient because they demonstrated launching about 100,000 of these goroutines and they all executed in a matter of seconds.
  • Erlang style, you create channels for communications. Channels can carry anything including other channels. You drop stuff into channels and pull them out of channels. This makes writing multi-threaded or client server apps look very simple. To investigate
    1. How easy is it to distribute this over a network? Can I just toss a bunch of goroutines onto other machines?
  • Closures, which should make a lot of people very happy.
  • Reflection. To investigate
  • Dynamic types. To investigate
    1. What is this? Is it like .NET?
    2. Can I call any method on it?
    3. Does it automatically implement all interfaces?
  • 10-20% slower than compiled C code. That's quite impressive.
  • ARM compiler.
    1. huh? Does this mean people can use go to write code for phones and mp3 players?
  • Automatic memory management. There's talk about a concurrent gc.
  • Some important libraries apparently already exist like for html templating and testing.
  • There was something about slices and arrays and maps. Need to investigate further to see if they have anything cool. Like the select, map, reduce type operations on slices or arrays?

Saturday, September 19, 2009

Ruby 1.9 is way faster than Ruby 1.8.6

I got a chance to test it. Its actually true. I'm on OSX so I used mac ports to install ruby1.9 parallel to 1.8.6. Since I use TextMate for development, heading to preferences -> Advances -> Shell variables and setting TM_RUBY to the actual location of the Ruby 1.9 binary. My Sudoku solver previously had the following results.



Analyzer::Simple ran 5 puzzles in 5.524553 seconds with 107442 moves and 107168 rollbacks
Analyzer::ConstraintsChecking ran 5 puzzles in 37.719066 seconds with 36202 moves and 35928 rollbacks
Analyzer::LeastChoicesFirst ran 5 puzzles in 0.745781 seconds with 631 moves and 357 rollbacks


Here are the same results of the same runs with Ruby 1.9.


Analyzer::Simple ran 5 puzzles in 3.905158 seconds with 107442 moves and 107168 rollbacks
Analyzer::ConstraintsChecking ran 5 puzzles in 25.035866 seconds with 36202 moves and 35928 rollbacks
Analyzer::LeastChoicesFirst ran 5 puzzles in 0.527032 seconds with 631 moves and 357 rollbacks


Now ideally it should be twice as fast but my machine is loaded up with a lot of open tabs :/. But still yeah Ruby 1.9 better!

Saturday, September 12, 2009

DEADBEEF for the CAFEBABE : (otherwise known as) The Great Choose Your Own Crc32 Adventure

     Ever since people have been transmitting information, there have been mechanisms to ensure that the transmission was successful and the received information was what was transmitted. On the internet at various layers you have some amount of redundancy and error checking. One such popular approach of verifying transmissions or data is the checksum. One really popular checksum is the Cyclic Redundancy Check (CRC). Various CRC's are used in a variety of places like reading from cds, verifying that a zip or a rar archive has been opened correctly or transmitting files.

    For a long time, I've been involved in fansubbing anime and distributing it online. Before bittorrent, which now allows people to almost broadcast files and involve many people in the effort, it was necessary for people to distribute files one at a time. People used to distribute files on irc, host it on websites and ftps. Very often you'd have received a file as the tenth or even the hundredth person in the chain. There was always a small chance of corruption and over time you were quite likely to receive a file that was corrupt. In order for people to ensure that the file they received was accurate, the practice was for the group to publish the CRC32 of the files they released. A crc32 was really convenient. It was an 8 character hexadecimal string, which means it's easy for humans to read. But if your file was corrupted, there was only a 1 in 2^32 chance that you wouldn't know about it. Fairly good odds.

     Ofcourse, the crc32 is not a cryptographic hash. It's possible to reverse it or to create junk which has the same crc32. The idea was not to prevent sabotage, but accidental errors. Since crc32 was reversible, anarchriz wrote an article about how to reverse a crc32. In around 2002 or 2003 there was a brief fad in the fansubbing community. Someone had written a small program to modify files to make sure it had any chosen crc. So groups would modify and release files with crcs they thought would look cool. 

    Last year (2008), the group I'm a part of,  Live-Evil, thought we'd release an old show called Kimagure Orange Road. Ken Hoinsky our fearless leader though it would be fun to sort of do a retro thing. One of the ideas we had was a custom crc for each episode. The first episode would be abcb0001, the next abcb0002 and so on. (The ABCB coffee shop is a big part of KOR). He managed to dig up anarchriz's document so we started poring over it. It looked doable, but we didn't want to do the work of actually writing this code and testing it. Surely it had already been done. We couldn't find the code that had been used previously. But we did find this article by Bas Westerbaan where he had implemented this algorithm in python. Perfect! So we downloaded it and ... it didn't work.

     That was odd. We'd patch files according to the documentation, but when we checked the crc of the file it was not what we'd tried to patch it to. So I started looking at this code to see if it was easier to make it work rather than write my own. Turned out there was a small mistake. We fixed it and the code worked fine. So after communicating this back to Bas, we were merrily on our way. Each episode of KOR has since had crcs of the form abcbxxxx.

     I've put together a tool in python to take a file and patch it to whatever crc you choose. I first tried to write a pure python crc calculator, but it was uber slow on large files. Looking around I found that most people delegate the crc calculation bit to C. So I downloaded this open source python project called cfv and lobotomized it heavily to the point where all it does is calculate crc and nothing else. I wrote a small python script to call both the cfv script and bas's crc reversal script and patch any given file to any crc you choose. Now, how does the patching work? It calculates what sequence of 8 bytes should be appended to the given file so that the crc is whatever crc you have chosen. Most media files like movies, music even pdfs ignore any garbage at the end. So all it does is append a few junk bytes to the end. 

    To use it you just pick a fun sounding crc. (Canonical examples are DEADBEEF and CAFEBABE). Run it like so. python crcFilePatcher.py --file=MovieName --newcrc=deadbeef. I've pushed all the files onto github here. Have fun :)