Saturday, July 31, 2010

My Biggest peeve with event driven programming


is having to build constructs like this.



I need to load a bunch of templates up and then perform the rest of my actions. Event Driven programming frees you from excessively specifying order in code. Overdoing the Sequence bit in Sequence, Selection and Iteration. Except when you really need sequence, the code becomes messy. The sad part is this isn't even a great solution since it loads files one at a time. Ideally you wanna parallelize this.

Monday, April 5, 2010

Generalized Expression Simplification in SICP?

A couple of colleagues of mine from work and me have been working our way through SICP. So far thing's have been pretty good but we recently came across a series of exercises on "interval arithmetic". The basic idea was to be able to construct a term as an interval of uncertainty (represented as a tuple (a,b)) meant for use in engineering calculations. Subsequently to define a series of arithmetic operations (+, -, * /) that operate on intervals and create new intervals with different uncertainties.

After a few such problems, it is brought to our attention that if an expression is written in two different forms, the value obtained can be different. For example (1/R1 + 1/R2) could also be written as ((R1 + R2)/R1R2) and these yield different values. The problem is clearly that all the operations we have defined increase the uncertainty and if an expression is written in a manner that results in more operations we should expect more uncertainty. In fact, as this gist demonstrates, (R2 - R1 + R1) can result in something more uncertain than R2 by itself. 

Exercise 2.16 asks us to attempt to write the package in a way so that rewriting an expression in a different form will still yield the same answer. The key to this seems to be a combination of 

  • Some generalized method to simplify expressions.
  • A method of identifying if two intervals are the same.

I initially interpreted the second point as being a way to check if two intervals are equal. But that's a mistake. Two intervals might be equal because they represent the same concept, or because they are two different terms with the same uncertainty and value. For example, if I was building a circuit with resistors, and I happen to use two resistors with the same rating (same value and error), they would still not be identical elements in the circuit. I wouldn't be able to "cancel them out". One of them might have an actual value that is on one side of the interval of uncertainty and the other might be on the opposite end. 

The only way to identify if two intervals are the same would be if I redefine my constructor to create with each such tuple, an object id that is meant to be unique. Now if two intervals have the same object id, I'd know that they were in fact the same. An extremely simplistic method to generate such an object id might be to use a random number between one and one billion. 

However, I have no idea how to even begin approaching the first problem. Such a thing would be trivial if the only allowed operations were addition and subtraction. (I could simply collect all the same terms and find their coefficients). Or only multiplication and division. But, with all four, I don't know where to start. So if anyone has some insights or resources, please do share them with me.

Sunday, March 21, 2010

ActiveSupport and json pure hate each other :(

See
 

The sad thing is the DataMapper serializer DM_types seems to depend on json_pure and I've got active support all over the place.

Saturday, March 20, 2010

Fixing a datamapper bug

DataMapper, a ruby ORM, has become fairly popular of late. I ran across an extremely annoying bug yesterday. This explains the bug and a fix.

DM allows you to make updates to the database in a manner that includes a where clause. This could mean efficiency wins or concurrency wins. So for example if I have a payment model and a payment moves from pending to say success or failure, I might have code that looks like this.




The problem with this is if I have multiple threads, processes w/e and I want to make sure the same payment is never updated twice. I shouldn't have one thread/process mark it a success and a different thread/process mark it a failure subsequently. The previous code is susceptible to race conditions. Ideally I'd like to write one line of SQL that says update the payment where status is pending. Then I'd write this.




This actually does only fire a single query which is an update clause with an inner where clause. I've appended to the gist the relevant line from the log file. This is because I've used update! instead of update.

The bug occurs when the status field is one of the more complex datamapper types like say enum. I've created an example which reproduces the error.



The first part of the test using update succeeds, but the second one with update! fails. It turns out superman is still in limbo! The reason for this is the Collection class in the DM module has a bug. The update! method has the following piece of code. Below that compare with similar code from the private _update method of the Resource class


The difference is that in one case the code says property.valid?(value) and in the other case, it (correctly) says property.valid?(property.get!(model_instance)) . This difference is important for special types like Enums. When I say update!(:status => :alive), a new model instance is created and the modified properties are asked to validate the values being inserted to verify that they are sane.

So a property of type Enum[:dead, :alive, :limbo]  is asked to validate the value being inserted. The value being inserted is :alive, but the underlying primitive value stored is 1 (since :alive is the 2nd field in a 0 based array). So property.valid?(1) is being checked, which is obviously false. 

In the second case, property.get!(model_instance) would actually return :alive. So property.valid?(:alive) is checked, which is true.

Now obviously there's a fair amount of indirection going on here for whatever reason. Anyway this is the quickfix I used to monkey patch my Collection class and move on.

Anyway it would be nice if the behavior of _update and update! could be moved to a single place so here is a proposed change.

Sunday, February 14, 2010

Sudoku solver, alternate optimization strategy, final

I had written about optimising the sudoku solver by remembering moves made in the past. Another strategy that I had tried parallely was to do a short-circuit evaluation. Each iteration of the solver is about searching all the empty cells to find the list of acceptable moves and then choosing the cell with the least moves. However, clearly a cell with zero options means that it's time to backtrack. Clearly instead of continuing to search the list of cells, that's a place to jump out of a search. Similarly a cell with one option means that there is only one option so I might as well make that move.

I experimented with a short circuited search where I jumped out the minute I found something with some "n" moves and I varied "n" from  upwards. It turned out the best option was to jump out when I had zero or one option. Sadly this optimisation only gives me a factor of 3 improvement, and doesn't stack much with the previous one. 

Anyway the whole code is available at github -> http://github.com/pathsny/Sudoku-Solver

Tuesday, February 9, 2010

... Okay 3, 2, 1 Let's Jam. Quicksilver, Ruby and Mac OSX

QuickSilver is an application launcher for Mac OS that is very popular productivity enhancer. It's extensible through a ton of plugins. But it also allows people to launch custom scripts, which allow people to quickly script any action they commonly perform. I first saw the possibilities when I read this article. Some of those scripts were quite useless to me, but there are a few I end up using all the time. Like shortcuts to turn my wireless radio off when Im not online and wanna save power. So this is about 3 QuickSilver scripts I recently wrote

Initially I wrote a few of them in applescript, but when I wanted to write the script to "automatically pastie my code", I decided it would be easier to use a language I'm more comfortable with. I found a ruby-applescript bridge, so I decided to go with ruby. The bridge is obtained by installing the rb-appscript gem. You can find on these scripts on Github at http://github.com/pathsny/QuickSilver

I also attempted a "maximise" script based on this article. Sadly it just doesn't work :/. It's always incredibly slow the first time you launch it.

Sunday, January 10, 2010

Programming Async code in a Sync style using Laziness and Functional patterns

The first time I came across some common idioms of functional programming happened when I was working with .NET and came across the List methods. The findAll and ForEach methods. For me it was a revelation. I kept rewriting all the code on the project I was on, using these idioms.

Since functional languages were popular at that point of time and I discovered that such languages were where this idea had sprung from,, I decided to start learning Haskell. I'm still learning it, but one interesting thing about Haskell was that it is the Laziest functional language. Everything was automatically lazy and strictness had to be enforced somewhere when you really needed. Like say in the IO monad where you say when you want to print something, you mean like right now! So if I had to print the first 5 even natural numbers greater than some x, I could write this. (Warning: horribly contrived example and code).
 f x = take 5 $ dropWhile (<x) $filter even [1..] 
Notice the infinite list of natural numbers there? The reason this works is because Haskell is lazy. What really happens here is that when I print the output of that, the print function asks take 5 to return output. take 5 asks dropWhile to return 5 items. dropWhile asks filter to return items until it finds an item that is greater than x and then asks for the next and then filter asks the infinite source for even items and keeps doing so until dropWhile is satisfied. The realization here is that on the left side we have a source, on the right side is a sink. Everything else is a component meant to build a pipeline. Something about the system knows that this system's flow is governed by the sink, and so the pipeline components all pull output until the sink is satisfied. The idea is that I can write more idiomatic code and use things like infinite lists when I want to. .NET had done the same thing with the IEnumerable stuff from 3.5. So I could now write
InfiniteList.FindAll(Even).SkipWhile(y => y > x).Take(5)
I recently read about another realization. Another common pattern while programming is an event based system. Like a web application waiting for requests from someone. A Javascript function that makes an ajax call and displays the output on the screen and so on. Specifically, an async system where you wait for events to pour in and for each event you go on to perform some actions.

So imagine I had a system where I go on to call some number generator which dials up a service halfway across the world. the service spews out natural numbers. I want to register a callback function which waits for numbers, ignores the odd ones. Maintains a flag so it knows at least one such number has been greater than x and then gives me the next 5 numbers before swallowing them. (yeah I know, incredibly useful). Since this deals with state, let's do this in some imperative ruby style language. Typically I might have to say

def handler(n)
return if n % 2 == 1
@skip = false if n > x && @skip
return if @skip
@num_returned = 0 unless @num_returned
return if @num_returned > 5
@num_returned += 1
puts n
end

numbercaller(handler)

The beautiful functional style code has morphed to something so ugly all because I switched from pull to push async. But it's not hard to recognize that this problem is very very similar to the one I was previously solving. All that changed was the source here controls when stuff goes through the pipeline instead of the sink. So with the right tools I should be able to use the same pipeline tools as before and have the language features just switch the flow control around. Numbercaller should present the same interface as an iterable system, allowing me to use the same pipeline components, but should push stuff through the pipeline instead. I should be able to use not just each, but also map, dropWhile, take and so on. It's just a different take on Laziness Or as the article I'm going to link you to put it, the Observable pattern and the Iterable pattern are virtually the same thing. You can read all about it here. The Rx (Linq to Events).

Truth be told, I'm a little nervous about this pattern. It's a new abstraction and like all abstractions it has the potential to be leaky. Imagine we managed to implement this on some Javascript ajaxy interface. On the click of some item I want to make an ajax call. I want the result of the Ajax call to update some div.

button.onClick = function(){
AjaxUpdater.update(url,function(response){
html = process(response)
$("blah").innerHtml = html
})}
doSomethingElse();

Now with the new realization we have we'd like to write.

responses = button.clickEvents.each(AjaxUpdater(url).map(process)).flatten
responses.each(function(html){$("blah").innerHtml = html})
doSomethingElse()
Now, the important point here is that the first two lines need to happen at some random time in the future, and in fact repeatedly each time I get a response. But doSomethingElse needs to happen right away without waiting for click events to occur. Since the abstraction hides the fact that this event appears at some time in the future and potentially infinite times, it might mean that I accidentally make some event occur with every click? It might mean buggier code, or some unfortunate performance implications. It might simply mean we need much smarter interpreters. The reactor pattern (non-blocking io). has become very popular of late. It involves converting a lot of operations that are traditionally treated as synchronous, such as writing to disk, or making a blocking call for data without which you cannot proceed, to asynchronous. It has some very interesting implications and provides considerable performance advantages. This pattern might make writing code with the reactor pattern as idiomatic as code which treats these operations as synchronous.