stuff

The many faces of nameless chunks of code in Ruby

2013-11-20T21:34:00.000-08:00

The internet is littered with posts about the differences between blocks/procs and lambdas. But when experienced programmers first encounter this topic, they have several questions about it and hence finding themselves having to read several posts or threads. I'm going to try to summarize the answers to several related questions here.

At a very high level as far as the philosophy behind the design of the language goes, there is a question of why there are different ways to do things. One of the guiding principles at play is that there is more than one way to do everything. So if it makes sense to have two different approaches to do something that are convenient in different contexts, then both approaches often exist so that you have the more convenient option at hand.

Another important guiding principle that's a part of ruby is the principle of least astonishment. I'm just mentioning it here but I'll talk about why its relevant further down.

On a mechanical level it's important to understand the differences between lambdas and blocks/procs. Here is one source. The important differences are argument checking, and the behavior of return, break and continue. It's important not only to understand how they are different, but that there are things you can do with lambdas that you cannot do with blocks and there are things you can do with blocks that you cannot do with lambdas.

The next thing to understand is what they are.

Lambdas are lambdas just like you see in most other languages where anonymous functions exist. It's a closure that you can pass around as a parameter or return and then call.
Blocks are something completely new. They are a syntactic construct, not first class objects. At a superficial level they behave like lambdas (with the restrictions that they can only be the last parameter and so on) and the differences between blocks and lambdas have been described above.
Procs is like an object representation of a Block. You use a proc when you'd like to accept a block and then do something funky with it. The real decision api wise that you must make is whether you want to accept a lambda or accept a block. Procs are what you get when you peek under the hood and decide that you really need to something a little more unusual.

People who are used to lambdas find blocks and procs inelegant and confusing. Inelegant because there already is a mechanism to pass closures around. Confusing because it has unusual semantics. Once you have internalized what they are and what they can do differently, it's easy to see that these differences at times make certain things possible that would not have been possible otherwise.

But let's talk about this being confusing and the principle of least surprise. The reason that this seems confusing is because it's not how closures in other languages behave. But the principle of least surprise has never been about compatibility with concepts from other languages. All that the POLA says is that once you understand how something works and how to think about it, apis will ideally behave in a non-surprising manner. It does not imply that you will not have to learn how ruby works. What it DOES imply though is that there is a conceptual model that can be used to understand blocks that is different from the conceptual model people use for lambdas. One where the behavior of return in blocks is not confusing at all.

The way to think about blocks is that the code inside a block belongs to the function in which it is defined, and hence the semantics of the code internally should behave exactly as it would if it were outside the block. A block is not a function that you are passing in ... that's a detail of how it's implemented. A block is a chunk of code that is from your function that executes when the method you call chooses to activate it. And since its a chunk of code from your function, the return statement continues to behave the way it would behave if it were not inside the block. Code inside the block should not be thought of as special from code outside. It just happens to run at a time that's not of your choosing. it's a different mental model.

Continue and break are special keywords for code inside a block that break out of the block and break out of the function that took the block respectively. This makes them consistent with loops. Since blocks are often used as iterators, break and continue behave just the same inside iterators and regular loops.

Introducing StepRewrite (aka I finally understand macros)

2010-10-31T11:37:00.000-07:00

So it's finally done! I could probably do a bit more, but for the moment it's working and packaged! For a while now, I've been writing about a problem I want to solve, and discovering the tools to solve it. Of course it's also a problem that may not really exist, but fuck it! To really understand the motivations behind this thing I've built, follow those links above, but here is the short version.

I said that Evented IO makes you write ugly code, just to sequence a bunch of operations. I figured out that the only way to be able to sequence code normally but run it in an evented IO environment is to use macros and actually rewrite code. So I wrote step_rewrite (which has obtained that name since it was born out of an unholy union of step.js and rewrite). If you install, it you can write code like this

that behaves as if it was written like this.

This doesn't mean that you are forced to (or should in fact) write every block in this manner. It's meant to be used only when the blocks exist purely to sequence the rest of the method to occur inside a callback. i.e. when you really do intend what the first piece of code implies. That you have a series of operations that should occur one after another, and it just so happens, that they perform IO. (if you squint really hard, you can pretend the &_ bits are invisible).

So step_rewrite, Can be used either, as a function that takes a block to eval, or as a fuction that takes a block to define a method. It rewrites the code and converts every function call taking a special callback, followed by some other code into a form where the function now receives a block with the rest of the code in the block.

becomes

for situations where you intend the former but your environment requires code to do the latter.

It also converts return values into block arguments. So that

becomes

This is acceptable for the most part because using &_ has made hunter.kill the last statement in it's block. So anything it returns will be the return value of its block.

Of course this abstraction is leaky. There are a lot of complicated situations where you have to be aware of what the converted code will look like. I just hope that 80-90% of the time, you can be oblivious to it.

This works using the ParseTree gem. ParseTree converts code into S-Expressions, the language of macros. Here are some examples of S-expressions.

Given the S-expression, I can now manipulate it. Chopping out bits, wrapping it in other pieces. The resulting S-expression is converted back into Ruby using Ruby2Ruby and I'm done :).

So yeah, what I really do is convert an S-Expression of the first form to the second form.

I'm excited about this because I finally understood what the whole macro thing was about. I've always heard that it was about extending the language itself. But I never really got it. It seemed to me that anything I wanted to do, could either be implemented using good old fashioned meta programming, or was not possible even with macros. I could not see this middle ground of extending the language without writing a full fledged parser. But now I finally see it :)

Lastly, it looks like Narrative.js does something similar to what I was trying to do. It contains a full javascript parser, so its a bigger project. I need to look at it to see if it converts code into a similar form, and if so how it overcomes various problems that I have because of leaky abstractions.

Silly Mistake

2010-10-20T00:36:00.000-07:00

So recently I stumbled across this webcomic Wonderella, which is awesome! After I read a few pages, I decided to do what I do with all webcomics I like, ... download the whole thing. Because I can't stand that 30 second - 1 minute wait between pages. Previously, I used to do this sort of thing with wget, and try to discover the rule, and invariably there would be some complication or the other.

Then I wised up and started using Mechanize, which made things a breeze. And sometimes I've started multiple threads/processes and somehow allocated tasks to the threads so that the whole download happen faster. But this time I thought, "I've been thinking and writing about evented I/O a bit now, someone on the Node.js irc channel had mentioned he used node.js for http scraping, why not try the same."

I decided to use it with EventMachine because that way I could also test my new work in progress (then), pet project. (which I'm going to write about soon). So I started reading about EventMachine and used hpricot for the html parsing. In 30 minutes I had the script ready and I had tested bits of it.

I asked it to just get the first four comics from the archive page, and it worked. So I did it again, this time asking it to get the entire thing. But, nothing happened!! Nothing was downloaded. Little print statements let me know that it had correctly parsed the archives page and was going after the right pages, but nothing! I let it run for a while, but still nothing :/.

It took me a while to realize, I had been solving the wrong problem. I wanted to do things in parallel. But my bottleneck, wasn't how much memory I could afford on my machine. This is where evented I/O helps you. My bottleneck, was how many connections the webserver would allow me to make. I was attempting to download all the comics simultaneously, and that didn't work.

So like yeah, funny story :/. I just redid it with mechanize and it was delicious. Here is the code if anyone is curious what EM code looks like.

Changing the Rules

2010-10-03T05:12:00.000-07:00

Evented programming with nonblocking I/O is the new black. In evented I/O systems, every single I/O operation (or at least the expensive ones) all take callback functions, and execute asynchronously so that your code does not block on I/O. It either continues on, or it sleeps to allow a different request a chance. The main purpose of these systems is, without really getting into concurrency or threads, people can parallelize a system that spends a fair amount of time on I/O and handle several independent requests simultaneously.

Every time an I/O operation starts, the system registers your callback and continues on. Once the Input or Output operation is completed, the callback executes. So every time you intend to do any input or output operation, you put the code that's meant to execute after it is done, in the callback. But, this means, that in any interesting program, a large part of it is going to be spent in nested callbacks.

Let's look at the sample piece of code I've been playing around with. Except I'm going to use Ruby without Event Machine and mimic some code I've seen in node.js (node has non blocking operations for everything, so it should be easier to follow). For anyone who hasn't read this before, this code writes hello to file, appends to it with world and then reads the last line. Every thing I/O takes a callback. Writefile, read, write, close etc

Notice that the primary purpose of callbacks here, is not, as is customary, to jiggle the stuff in the middle of an algorithm. This is not the strategy pattern. It's far more low level than that. It's one of the three fundamental control structures in programming (Sequence, Selection, Iteration). Callbacks here are to sequence your operations. By using a callback, you are sequencing operations so that whatever is inside the callback happens after the current operation. And it can make your code a bit hard to follow.

Programming languages, have had from day one, a way to sequence operations. You just put the operations down one after the other, and that's the sequence they occur in. We could imagine a magical programming language which has all the power of languages we currently use and love, with special support for evented IO. So I can mark those calls as special but still sequence operations normally and allow the compiler or interpreter to understand that some operations are to be executed only after this async one returns.

Step.js was a library that tried to solve this problem, so I ported it to Ruby. It takes a series of lambda's as input, and executes them so that each subsequent one is executed when the previous callback returns. Basically, it sequences them correctly. This is what the same code looks like with step in Ruby (cb is chosen as a magic variable to indicate that &cb is where the callback to each function would normally be passed).

Firstly, it has a bug. close doesn't really work because the file has gone out of scope. The file handle is not being passed on by write. Secondly, it's pretty ugly. Lambda, lambda, lambda ... If only I could take a bunch of code without them being wrapped in lambdas and sequence them correctly.

While Ruby allows for a lot of meta programming and building DSLs, this problem seems unsurmountable, unless we change the syntax of the language itself. Unless we invent our own control structures that do what we ask them to and extend the language. What we need are macros.

A macro is something that allows you to automagically expand something preselected into a sequence of operations. Excel and Word had macros. Games have macros. It usually expands out into a larger body of code and prevent you from having to type it out. Like you could imagine a macro that contains two or three operations that always occur together. You might want to allow for variable substitution in your macro so that it's actually useful.

The macro is read by some macro interpreter or compiler, it modifies the code, and then the regular interpreter or compiler reads your code. But, how powerful should your macro system be. At first, you might decide to keep complexity low, only allow templating. So that for example you could have a macro that given the name of a loop variable, generates code to run throught its items and do some operation (in case your programming language of choice doesnt already support this). Later you might want rudimentary branching so that you can, for example, change the code that is executed in development mode to make for easy debugging. Or looping support to generate repetitive code. Before you know it, you have a whole other programming language. In which case why not allow your programming language of choice itself to be your macro language? So if you use C, use the full power of C in your macros. If you use Ruby, use the full power of Ruby in your macros.

Lisp does that. Lisp gives you access to your parsed code in the form of S-expressions and allows you to modify or generate S-expressions before it executes this code. It's built into the language. And it works really well because the language itself has support for it.

Now Ruby and Javascript have always had eval. You could load your source and modify it. But the effort of parsing such code is large, and you don't want to work at the level of strings. Which is why I got really excited when I saw this awesome video by Reginald Braithwaite. Where he built some macros in Ruby. Using the ParseTree project to parse ruby and get S-expressions (which can then be modified) and Ruby2Ruby which takes S expressions and generates Ruby code.

So my next step is to rewrite Step using macros so that you can sequence code the usual way, but allow it to work on an evented IO system.

Lastly, let me leave you with a joke.

A drunk loses the keys to his house and is looking for them under a lamppost. A policeman comes over and asks what he’s doing.

“I’m looking for my keys” he says. “I lost them over there”.

The policeman looks puzzled. “Then why are you looking for them all the way over here?”

“Because the light is so much better”.

Node.js means having to put your toys back in the closet after you're done :(

2010-09-30T10:48:00.000-07:00

When I was first exposed to Ruby and how easy it was for functions to accept blocks and how ubiquitous such functions were, I was delighted that for example

File.open {|file| foo(file); bar; baz; qux;}

would let me do a bunch of operations on a file and not have to worry about closing the file. This was similar to how cool it was to move from C++ to C# the first time and stop worrying about memory. This was how things should be.

And then recently this post on Stack Overflow led me to this code sample in Node.js. The code is opening a file and appending the word world.

I have to remember to close the file!! In a language that allows functions to be passed as parameters, I have to remember to close the file. How much does that suck? My colleague, Rakesh Pai, a node.js fan, suggested that maybe this is because this is a low level api, and high level apis will take care of this problem.

But on thinking about it, this may not happen. I mean I could imagine a fs.append function that takes a file and some text, appends it and then calls the callback, but not something that gives you access to a file object to play with.

Or maybe fs.open could take 2 functions. One meant to be executed after opening the file, and a second to be executed after closing the file. But this will probably cause a lot of scoping grief. So maybe the first function will have to return an array of variables that then become available to the second one?

So it's possible this never happens. If this is the case, is it possible that the file will not be closed for a long time? I mean if your code is something like

Then, wouldn't your workers hang on to the file handle for an awfully long time? Even though it isn't being used? Since the file is trapped in that closure there, there's no way out unless my runtime has analyzed my code and seen that I will never use file from this point on. Not even via an eval. I can't see the GC or whatever figuring this out. How about if say as a good programmer, I do this.

Now can it figure it out, since I haven't passed file to my doJob function. Does that mean that in this scenario I'm safe? I assume it will since file isn't trapped in a closure here. But I don't know how the v8 GC works. But in the previous scenario, it feels to me like it will not be able to detect my intention, so we'll have to be a bit more careful programming with node.js

Edit: From tuxychandru's comment, I realised that I may not have been very clear. I wasn't just whining about some api requiring people to do work. I understand that there will always be low level api's that require more care. I was saying that in some ways node is being positioned as an answer to the programming ills for certain kinds of problems. I was saying that it looks like the paradigm has resulted in a new solved problem returning. That as part of everyday programming, cleaning up is required.

Naive Implementation of Step.js in Ruby

2010-09-28T00:22:00.000-07:00

I wrote about step.js here and here, since I thought it was really cool. I've built a naive implementation in Ruby. To demonstrate the usage, imagine the same set of functions that node provides existed in Ruby. So this is how code would be written both in the serial and parallel cases.

Ruby uses self unlike the this in javascript. But self is lexically scoped, whereas this is dynamically scoped. So here I have chosen to have the caller decide the name of the magic variable (cb in the example above) and use instance_exec to make it happen. There's no real advantage to this approach though, since existing references to self will still break.

I call it a naive implementation because I've made assumptions that are true in the node environment, but not necessarily in a random piece of ruby code. Hence, there are some issues. But first, here is the code itself.

So what are the problems?

The serial implementation has a bug when you want have to a callback function receiving a parameter that is needed by it and the functions further down. Since scopes are nested by default in Javascript or Ruby, the intermediate functions dont pass them through. If you see the first example I wrote, or in the ruby example today, it actually has this bug. File.open provides a file handle meant to be used by writeFile and close. I didn't realize this was a problem until my ruby code broke. The javascript implementation has the same bug, and I don't think it can be fixed.
The parallel implementation works by incrementing a counter every time someone requests the parallel callback and executes the next function only when the counter is zero. This might work in a node style environment where any callback can execute ONLY after the current code completes executing. So the counter starts decrementing only after it has been incremented completely. Which means that if the IO functions I use came from Event Machine may work. But sync calls masquerading as async will break since they will execute the callback right away. So the counter starts decrementing immediately and the next callback will execute several times. This is the same implementation in the original javascript, but it is not a bug there since the node environment has no sync I/O. One of the reasons node gets more love than Event Machine.
At the end of it, this is as ugly as the javascript implementation. You have to wrap everything in a lambda call. It will be interesting if we could do better than that.

On a side note, I looked at the implementation of step after I implemented mine in ruby. Tho original implementation chose a more procedural style for the main piece, whereas I chose a more functional style. But it feels as if the procedural is easier to understand. The eyes of everybody I show my code to seem to glaze over the bit where I do a fold over the list of lambdas I get as the second parameter to step. What do you think?

Of course all this goes out the window when I get to the parallel method where I have to maintain state externally. I have no idea how to do that in a more functional style. Any inputs would be cool :).

Beware ActiveSupports Default Autoload Mechanism

2010-09-15T09:04:00.000-07:00

By default active_support sets its dependency loading mechanism to load rather than require. We're working on a non rails app with datamapper and some models add before :create hooks. If you dont take care, these models are loaded twice and the hooks are added twice which can result in strange behaviour.

So remember to do this ActiveSupport::Dependencies.mechanism = :require

All Together Now

2010-09-12T14:11:00.000-07:00

Last time, I talked about the Step.js library and how it helped make code look better when using the evented programming paradigmn. However, the first time I complained about the ugliness of code, the problem I highlighted was different. The problem was having multiple actions occur simultaneously and to continue once they are all done.

Let's take the example of loading a bunch of files, reading the contents and printing out the largest one. (it's a stupid example, I know, but the core of it is loading up a bunch of files before continuing. What's done with them is irrelevant. When I needed to do this I was caching a bunch of templates for a templating library.). Either because we are obsessive micro-optimizers or because the call is very slow (maybe the drive is on the network somewhere), we decide we want to load the files in parallel.

So why am I complaining? Firstly it's a lot of code. The problem is simple enough that I shouldn't have to write so much code. Secondly I had to maintain state in a scope outside the load function. I could trap it in a closure, but it's still annoying.

Rakesh Pai, a big fan of and contributor to tho Dojo project pointed out that Dojo's style of doing evented programming is in a way better suited to both this problem and the previous one than the default. Anything asynchronous in dojo always returns a Deferred object, which has a method called "andThen" . So you can do things like fs.writeFile(path, data).andThen(function(){fs.readFilepath)}) ... and so on. More importantly, a bunch of deferreds can be placed in a Deferred List and treated as one. Which means you can attach an event to be fired when all of them complete.

I think this too is a bit too wordy for me, but maybe it works in the entire dojo ecosystem as it is consistent with everything else. After all, Consistency is valuable for more than just databases and hashing schemes.

But let's see how step.js can help solve this problem.

Now that is so much nicer. No state maintenance, just pass in a function that executes multiple functions that take this.parallel as a callback and at the end we get all the results lumped together. The example is only marred by the stupid semantics of "this" in javascript (I call another function on array, so I'm forced to alias this. But without that magic, the library wouldn't work at all, so it's the price we pay.) and because for some retarded reason the arguments array is not a real array in javascript so I have to slice it in.

So there you have it, Step.js to write nicer code.

Lipstick For the Evented Programming Pig

2010-09-07T11:18:00.000-07:00

Previously, I have complained about how ugly code ends up looking with the evented programming paradigm. A pattern that might became vitally important for performance reasons, but nevertheless makes for super ugly code. I came across another similar issue recently, courtesy of a certain Stack Overflow post meant to demonstrate File I/O in every programming language.

The problem is to write "Hello" to a file, append "World" to the file and then to print onto the screen the last line of the file. (which ends up being "World" ... SHOCKU). I've stolen the solution from node.js, everyone's new favorite technology that will save the world.

Ugly, ugly, ugly code. Kill it with fire!! Now to be fair, this is partly because the node apis are quite low level. We might be able to look forward to an fs.append method someday for example.

I've heard people argue that it's just a matter of getting used to it ... I acknowledge that this is true. But it is true of all sorts of ugly code that adds a little more mental effort every time you parse it. I mean code I used to write back in college had high Cyclomatic complexity and could be difficult to follow deep inside a nested method. But back then I was a whiz at following it. I just didn't see a problem. But the truth is it took me a little effort whenever I read or modified it.

This is important because we all know that "code is read more often than it is written" right? So while you may get used to reading large, complex blocks of nested code, it takes its toll on you.

So the obvious next step is to start naming pieces of your code and pulling it out. Something like this.

Better, but still a bit ugly. But more importantly it forces you to write your code literally backwards. I'm sure you could find ways to prevent that backward effect locally, but it will remain an effort. The Stack Overflow post has something similar but that at least nicely breaks it up into composable pieces.

The other day I was discussing the same issue with Rakesh Pai, and Joel. What such code would look like ideally. We played "what if?". What if we could modify javascript however we saw fit and wrote sequential code that was converted by a machine to trivially nested code. Afterall programming languages are for people not machines.

Suppose say I could pass '...' to a function and that meant "take the next function and pass it as a callback to this one". And say '...' on a seperate line means that the remaining code below that line is a callback. So we get this.

Now this, is much better. It might be highly impractical, but something on those lines might work if we could change javascript as we saw fit. (say we had macros for example). I have grouped the lines of code into sections that might ideally become methods. (well basically we could create our append method there).

However, while we cannot change javascript, at least javascript has first class functions. So a library called step.js makes something similar possible. Take a look at the same solution with step.js.

Its pretty close to what we had, though there is a great deal of cruft with all those function wrappers. I mean, it's pretty much the same problem that you have with ruby. The only way to pass code around to be executed later is as a lambda.

Still its way better than what we started with. This compares quite nicely with our previous code. Except everything needs a function wrapper and the magic variable used is by clever use of "this". But I think given the popularity of node.js, step.js might be very useful to organize and improve readability of code. And if some effort is put into also making it easy to debug, it might be used all over the place.

Edit: I forgot to link to the library last time. Get Step.js Here.

My Biggest peeve with event driven programming

2010-07-31T00:21:00.000-07:00

is having to build constructs like this.

I need to load a bunch of templates up and then perform the rest of my actions. Event Driven programming frees you from excessively specifying order in code. Overdoing the Sequence bit in Sequence, Selection and Iteration. Except when you really need sequence, the code becomes messy. The sad part is this isn't even a great solution since it loads files one at a time. Ideally you wanna parallelize this.

Generalized Expression Simplification in SICP?

2010-04-05T23:13:00.000-07:00

A couple of colleagues of mine from work and me have been working our way through SICP. So far thing's have been pretty good but we recently came across a series of exercises on "interval arithmetic". The basic idea was to be able to construct a term as an interval of uncertainty (represented as a tuple (a,b)) meant for use in engineering calculations. Subsequently to define a series of arithmetic operations (+, -, * /) that operate on intervals and create new intervals with different uncertainties.

After a few such problems, it is brought to our attention that if an expression is written in two different forms, the value obtained can be different. For example (1/R1 + 1/R2) could also be written as ((R1 + R2)/R1R2) and these yield different values. The problem is clearly that all the operations we have defined increase the uncertainty and if an expression is written in a manner that results in more operations we should expect more uncertainty. In fact, as this gist demonstrates, (R2 - R1 + R1) can result in something more uncertain than R2 by itself.

Exercise 2.16 asks us to attempt to write the package in a way so that rewriting an expression in a different form will still yield the same answer. The key to this seems to be a combination of

Some generalized method to simplify expressions.
A method of identifying if two intervals are the same.

I initially interpreted the second point as being a way to check if two intervals are equal. But that's a mistake. Two intervals might be equal because they represent the same concept, or because they are two different terms with the same uncertainty and value. For example, if I was building a circuit with resistors, and I happen to use two resistors with the same rating (same value and error), they would still not be identical elements in the circuit. I wouldn't be able to "cancel them out". One of them might have an actual value that is on one side of the interval of uncertainty and the other might be on the opposite end.

The only way to identify if two intervals are the same would be if I redefine my constructor to create with each such tuple, an object id that is meant to be unique. Now if two intervals have the same object id, I'd know that they were in fact the same. An extremely simplistic method to generate such an object id might be to use a random number between one and one billion.

However, I have no idea how to even begin approaching the first problem. Such a thing would be trivial if the only allowed operations were addition and subtraction. (I could simply collect all the same terms and find their coefficients). Or only multiplication and division. But, with all four, I don't know where to start. So if anyone has some insights or resources, please do share them with me.

ActiveSupport and json pure hate each other :(

2010-03-21T21:33:00.000-07:00

See

The sad thing is the DataMapper serializer DM_types seems to depend on json_pure and I've got active support all over the place.

Fixing a datamapper bug

2010-03-20T00:35:00.000-07:00

DataMapper, a ruby ORM, has become fairly popular of late. I ran across an extremely annoying bug yesterday. This explains the bug and a fix.

DM allows you to make updates to the database in a manner that includes a where clause. This could mean efficiency wins or concurrency wins. So for example if I have a payment model and a payment moves from pending to say success or failure, I might have code that looks like this.

The problem with this is if I have multiple threads, processes w/e and I want to make sure the same payment is never updated twice. I shouldn't have one thread/process mark it a success and a different thread/process mark it a failure subsequently. The previous code is susceptible to race conditions. Ideally I'd like to write one line of SQL that says update the payment where status is pending. Then I'd write this.

This actually does only fire a single query which is an update clause with an inner where clause. I've appended to the gist the relevant line from the log file. This is because I've used update! instead of update.

The bug occurs when the status field is one of the more complex datamapper types like say enum. I've created an example which reproduces the error.

The first part of the test using update succeeds, but the second one with update! fails. It turns out superman is still in limbo! The reason for this is the Collection class in the DM module has a bug. The update! method has the following piece of code. Below that compare with similar code from the private _update method of the Resource class

The difference is that in one case the code says property.valid?(value) and in the other case, it (correctly) says property.valid?(property.get!(model_instance)) . This difference is important for special types like Enums. When I say update!(:status => :alive), a new model instance is created and the modified properties are asked to validate the values being inserted to verify that they are sane.

So a property of type Enum[:dead, :alive, :limbo] is asked to validate the value being inserted. The value being inserted is :alive, but the underlying primitive value stored is 1 (since :alive is the 2nd field in a 0 based array). So property.valid?(1) is being checked, which is obviously false.

In the second case, property.get!(model_instance) would actually return :alive. So property.valid?(:alive) is checked, which is true.

Now obviously there's a fair amount of indirection going on here for whatever reason. Anyway this is the quickfix I used to monkey patch my Collection class and move on.

Anyway it would be nice if the behavior of _update and update! could be moved to a single place so here is a proposed change.

Sudoku solver, alternate optimization strategy, final

2010-02-14T21:34:00.000-08:00

I had written about optimising the sudoku solver by remembering moves made in the past. Another strategy that I had tried parallely was to do a short-circuit evaluation. Each iteration of the solver is about searching all the empty cells to find the list of acceptable moves and then choosing the cell with the least moves. However, clearly a cell with zero options means that it's time to backtrack. Clearly instead of continuing to search the list of cells, that's a place to jump out of a search. Similarly a cell with one option means that there is only one option so I might as well make that move.

I experimented with a short circuited search where I jumped out the minute I found something with some "n" moves and I varied "n" from upwards. It turned out the best option was to jump out when I had zero or one option. Sadly this optimisation only gives me a factor of 3 improvement, and doesn't stack much with the previous one.

Anyway the whole code is available at github -> http://github.com/pathsny/Sudoku-Solver

... Okay 3, 2, 1 Let's Jam. Quicksilver, Ruby and Mac OSX

2010-02-09T08:35:00.000-08:00

QuickSilver is an application launcher for Mac OS that is very popular productivity enhancer. It's extensible through a ton of plugins. But it also allows people to launch custom scripts, which allow people to quickly script any action they commonly perform. I first saw the possibilities when I read this article. Some of those scripts were quite useless to me, but there are a few I end up using all the time. Like shortcuts to turn my wireless radio off when Im not online and wanna save power. So this is about 3 QuickSilver scripts I recently wrote

Initially I wrote a few of them in applescript, but when I wanted to write the script to "automatically pastie my code", I decided it would be easier to use a language I'm more comfortable with. I found a ruby-applescript bridge, so I decided to go with ruby. The bridge is obtained by installing the rb-appscript gem. You can find on these scripts on Github at http://github.com/pathsny/QuickSilver

I also attempted a "maximise" script based on this article. Sadly it just doesn't work :/. It's always incredibly slow the first time you launch it.

Programming Async code in a Sync style using Laziness and Functional patterns

2010-01-10T04:23:00.000-08:00

The first time I came across some common idioms of functional programming happened when I was working with .NET and came across the List methods. The findAll and ForEach methods. For me it was a revelation. I kept rewriting all the code on the project I was on, using these idioms.

Since functional languages were popular at that point of time and I discovered that such languages were where this idea had sprung from,, I decided to start learning Haskell. I'm still learning it, but one interesting thing about Haskell was that it is the Laziest functional language. Everything was automatically lazy and strictness had to be enforced somewhere when you really needed. Like say in the IO monad where you say when you want to print something, you mean like right now! So if I had to print the first 5 even natural numbers greater than some x, I could write this. (Warning: horribly contrived example and code).

 f x = take 5 $ dropWhile (<x) $filter even [1..]

Notice the infinite list of natural numbers there? The reason this works is because Haskell is lazy. What really happens here is that when I print the output of that, the print function asks take 5 to return output. take 5 asks dropWhile to return 5 items. dropWhile asks filter to return items until it finds an item that is greater than x and then asks for the next and then filter asks the infinite source for even items and keeps doing so until dropWhile is satisfied. The realization here is that on the left side we have a source, on the right side is a sink. Everything else is a component meant to build a pipeline. Something about the system knows that this system's flow is governed by the sink, and so the pipeline components all pull output until the sink is satisfied. The idea is that I can write more idiomatic code and use things like infinite lists when I want to. .NET had done the same thing with the IEnumerable stuff from 3.5. So I could now write

InfiniteList.FindAll(Even).SkipWhile(y => y > x).Take(5)

I recently read about another realization. Another common pattern while programming is an event based system. Like a web application waiting for requests from someone. A Javascript function that makes an ajax call and displays the output on the screen and so on. Specifically, an async system where you wait for events to pour in and for each event you go on to perform some actions.

So imagine I had a system where I go on to call some number generator which dials up a service halfway across the world. the service spews out natural numbers. I want to register a callback function which waits for numbers, ignores the odd ones. Maintains a flag so it knows at least one such number has been greater than x and then gives me the next 5 numbers before swallowing them. (yeah I know, incredibly useful). Since this deals with state, let's do this in some imperative ruby style language. Typically I might have to say


def handler(n)
return if n % 2 == 1
@skip = false if n > x && @skip
return if @skip
@num_returned = 0 unless @num_returned
return if @num_returned > 5
@num_returned += 1
puts n
end

numbercaller(handler)

The beautiful functional style code has morphed to something so ugly all because I switched from pull to push async. But it's not hard to recognize that this problem is very very similar to the one I was previously solving. All that changed was the source here controls when stuff goes through the pipeline instead of the sink. So with the right tools I should be able to use the same pipeline tools as before and have the language features just switch the flow control around. Numbercaller should present the same interface as an iterable system, allowing me to use the same pipeline components, but should push stuff through the pipeline instead. I should be able to use not just each, but also map, dropWhile, take and so on. It's just a different take on Laziness Or as the article I'm going to link you to put it, the Observable pattern and the Iterable pattern are virtually the same thing. You can read all about it here. The Rx (Linq to Events).

Truth be told, I'm a little nervous about this pattern. It's a new abstraction and like all abstractions it has the potential to be leaky. Imagine we managed to implement this on some Javascript ajaxy interface. On the click of some item I want to make an ajax call. I want the result of the Ajax call to update some div.

button.onClick = function(){
AjaxUpdater.update(url,function(response){
 html = process(response)
$("blah").innerHtml = html
})}
doSomethingElse();

Now with the new realization we have we'd like to write.

responses = button.clickEvents.each(AjaxUpdater(url).map(process)).flatten
responses.each(function(html){$("blah").innerHtml = html})
doSomethingElse()

Now, the important point here is that the first two lines need to happen at some random time in the future, and in fact repeatedly each time I get a response. But doSomethingElse needs to happen right away without waiting for click events to occur. Since the abstraction hides the fact that this event appears at some time in the future and potentially infinite times, it might mean that I accidentally make some event occur with every click? It might mean buggier code, or some unfortunate performance implications. It might simply mean we need much smarter interpreters. The reactor pattern (non-blocking io). has become very popular of late. It involves converting a lot of operations that are traditionally treated as synchronous, such as writing to disk, or making a blocking call for data without which you cannot proceed, to asynchronous. It has some very interesting implications and provides considerable performance advantages. This pattern might make writing code with the reactor pattern as idiomatic as code which treats these operations as synchronous.

Sudoku Solver - Remembering The Past

2009-11-16T05:03:00.000-08:00

The sudoku solver is fairly fast and seems to be doing all the right things. By choosing the cells with the least options the number of useless moves has been reduced considerably. However, there is a fair amount of rework thats currently going on. Every time the state of the game is analyzed, the entire board is scanned to find all possible numbers that can be stored in each empty cell. Once a single cell is filled a new iteration begins. Again, this analysis is repeated by scanning the whole board. But, human's don't do that!

Once I've decided what numbers are playable in each empty cell. I don't repeat this exercise everytime I fill up an empty cell. Most empty cells are not affected by any single cell being filled. If I fill a cell that used to be empty, the only empty cells that are affected are cells that shared a row, column or block with the filled cell. So between iterations, if the analysis of the game was to be saved, I could quickly obtain an analysis for the next iteration by slightly modifying the analysis of the previous iteration. OfCourse this is going to take considerably more code to do, but it should show significant improvements.

Now unfortunately I haven't saved the results of running all this on the machine I used when I first started blogging all this. So the times I paste will not be comparable but will have to be scaled a bit. So Ill repost the highlights of previous runs.

Analyzer::Simple ran 5 puzzles in 3.115519 seconds with 107442 moves and 107168 rollbacks
Analyzer::LeastChoicesFirst ran 5 puzzles in 0.412483 seconds with 631 moves and 357 rollbacks

So last time we saw an improvement of a factor of ten. And now, here is the result of the latest change.

Solved simple with 45 moves and 0 rollbacks in 0.035017 seconds
Solved hard with 83 moves and 29 rollbacks in 0.010338 seconds
Solved evil with 217 moves and 161 rollbacks in 0.024111 seconds
Solved escargot with 217 moves and 159 rollbacks in 0.023176 seconds
Solved other_hard with 69 moves and 8 rollbacks in 0.009236 seconds

Analyzer::StorageBasedLeast ran 5 puzzles in 0.102331 seconds with 631 moves and 357 rollbacks

That's a speed up of about 4 times. This is pretty good. As always the code follows. Notice that this time my analyzer actually has to expose read attributes so that the state can be preserved from one iteration to the next. This series of posts is nearly done. I have one more post which will follow where I'll also link to the entire code on github.


class StorageBasedLeast < Base
     attr_reader :cells_with_values, :cell, :value

     def initialize(game, analyzer=nil)
       @cells_with_values = find_cells_with_values(game, analyzer)
       @cell, @possible_values = cells_with_values.min{|pair_1, pair_2| pair_1[1].length <=> pair_2[1].length}
     end


    def each
       @possible_values.each do |value|
          @value = value
          yield @cell, value
       end
    end

    def find_cells_with_values(game, analyzer)
       return game.empty_cells.collect{|cell| [cell, possible_values(game, cell)]} unless analyzer
       cells_with_values = analyzer.cells_with_values.reject{|pair| pair[0] == analyzer.cell}
       neighbours = game.neighbour_indices(analyzer.cell)
       cells_with_values.collect do |pair|
          values = neighbours.include?(pair[0]) ? pair[1] - [analyzer.value] : pair[1]
          [pair[0],values]
       end
    end
end

Highlights that impressed me from the video on the Go language

2009-11-13T04:50:00.000-08:00

Go is a new programming language developed by google and a lot of big names and you can find out all about it on it's wikipedia page here or on it's website here, or you could watch the hour long video here on youtube.

Anyway here are the highlights I got out of that video. I'll still need to read what's on the website to understand the language better, but it does raise my interest now. They wanted quick build times, better support for concurrency and a mixture of the advantages of static and dynamically typed languages. So here they are in no particular order.

Very fast compilation. Im guessing on large codebases this can make a big enough difference. It's interesting anyway.
Adding methods to anything. To investigate
1. Does this mean I can add methods to any existing type?
2. Does this mean I can add methods to an instance of a type?
Automatically implemented interfaces. This is kinda cool actually. An interface is declared as a bunch of behavior and anything which exhibits this behavior now implements that interface. Which kinda gives you a duck typing like thing. implements as a keyword does not even exist.
Unicode characters can be used in variables. π = 3.14159 is valid code.
Making code async is trivial. Declare a function and just type go <function name>. The function executes in an async manner. Apparently whatever threading like thing they've used is very efficient because they demonstrated launching about 100,000 of these goroutines and they all executed in a matter of seconds.
Erlang style, you create channels for communications. Channels can carry anything including other channels. You drop stuff into channels and pull them out of channels. This makes writing multi-threaded or client server apps look very simple. To investigate
1. How easy is it to distribute this over a network? Can I just toss a bunch of goroutines onto other machines?
Closures, which should make a lot of people very happy.
Reflection. To investigate
Dynamic types. To investigate
1. What is this? Is it like .NET?
2. Can I call any method on it?
3. Does it automatically implement all interfaces?
10-20% slower than compiled C code. That's quite impressive.
ARM compiler.
1. huh? Does this mean people can use go to write code for phones and mp3 players?
Automatic memory management. There's talk about a concurrent gc.
Some important libraries apparently already exist like for html templating and testing.
There was something about slices and arrays and maps. Need to investigate further to see if they have anything cool. Like the select, map, reduce type operations on slices or arrays?

Ruby 1.9 is way faster than Ruby 1.8.6

2009-09-19T08:54:00.000-07:00

I got a chance to test it. Its actually true. I'm on OSX so I used mac ports to install ruby1.9 parallel to 1.8.6. Since I use TextMate for development, heading to preferences -> Advances -> Shell variables and setting TM_RUBY to the actual location of the Ruby 1.9 binary. My Sudoku solver previously had the following results.


Analyzer::Simple ran 5 puzzles in 5.524553 seconds with 107442 moves and 107168 rollbacks
Analyzer::ConstraintsChecking ran 5 puzzles in 37.719066 seconds with 36202 moves and 35928 rollbacks
Analyzer::LeastChoicesFirst ran 5 puzzles in 0.745781 seconds with 631 moves and 357 rollbacks

Here are the same results of the same runs with Ruby 1.9.


Analyzer::Simple ran 5 puzzles in 3.905158 seconds with 107442 moves and 107168 rollbacks
Analyzer::ConstraintsChecking ran 5 puzzles in 25.035866 seconds with 36202 moves and 35928 rollbacks
Analyzer::LeastChoicesFirst ran 5 puzzles in 0.527032 seconds with 631 moves and 357 rollbacks

Now ideally it should be twice as fast but my machine is loaded up with a lot of open tabs :/. But still yeah Ruby 1.9 better!

DEADBEEF for the CAFEBABE : (otherwise known as) The Great Choose Your Own Crc32 Adventure

2009-09-12T21:56:00.000-07:00

Ever since people have been transmitting information, there have been mechanisms to ensure that the transmission was successful and the received information was what was transmitted. On the internet at various layers you have some amount of redundancy and error checking. One such popular approach of verifying transmissions or data is the checksum. One really popular checksum is the Cyclic Redundancy Check (CRC). Various CRC's are used in a variety of places like reading from cds, verifying that a zip or a rar archive has been opened correctly or transmitting files.

For a long time, I've been involved in fansubbing anime and distributing it online. Before bittorrent, which now allows people to almost broadcast files and involve many people in the effort, it was necessary for people to distribute files one at a time. People used to distribute files on irc, host it on websites and ftps. Very often you'd have received a file as the tenth or even the hundredth person in the chain. There was always a small chance of corruption and over time you were quite likely to receive a file that was corrupt. In order for people to ensure that the file they received was accurate, the practice was for the group to publish the CRC32 of the files they released. A crc32 was really convenient. It was an 8 character hexadecimal string, which means it's easy for humans to read. But if your file was corrupted, there was only a 1 in 2^32 chance that you wouldn't know about it. Fairly good odds.

Ofcourse, the crc32 is not a cryptographic hash. It's possible to reverse it or to create junk which has the same crc32. The idea was not to prevent sabotage, but accidental errors. Since crc32 was reversible, anarchriz wrote an article about how to reverse a crc32. In around 2002 or 2003 there was a brief fad in the fansubbing community. Someone had written a small program to modify files to make sure it had any chosen crc. So groups would modify and release files with crcs they thought would look cool.

Last year (2008), the group I'm a part of, Live-Evil, thought we'd release an old show called Kimagure Orange Road. Ken Hoinsky our fearless leader though it would be fun to sort of do a retro thing. One of the ideas we had was a custom crc for each episode. The first episode would be abcb0001, the next abcb0002 and so on. (The ABCB coffee shop is a big part of KOR). He managed to dig up anarchriz's document so we started poring over it. It looked doable, but we didn't want to do the work of actually writing this code and testing it. Surely it had already been done. We couldn't find the code that had been used previously. But we did find this article by Bas Westerbaan where he had implemented this algorithm in python. Perfect! So we downloaded it and ... it didn't work.

That was odd. We'd patch files according to the documentation, but when we checked the crc of the file it was not what we'd tried to patch it to. So I started looking at this code to see if it was easier to make it work rather than write my own. Turned out there was a small mistake. We fixed it and the code worked fine. So after communicating this back to Bas, we were merrily on our way. Each episode of KOR has since had crcs of the form abcbxxxx.

I've put together a tool in python to take a file and patch it to whatever crc you choose. I first tried to write a pure python crc calculator, but it was uber slow on large files. Looking around I found that most people delegate the crc calculation bit to C. So I downloaded this open source python project called cfv and lobotomized it heavily to the point where all it does is calculate crc and nothing else. I wrote a small python script to call both the cfv script and bas's crc reversal script and patch any given file to any crc you choose. Now, how does the patching work? It calculates what sequence of 8 bytes should be appended to the given file so that the crc is whatever crc you have chosen. Most media files like movies, music even pdfs ignore any garbage at the end. So all it does is append a few junk bytes to the end.

To use it you just pick a fun sounding crc. (Canonical examples are DEADBEEF and CAFEBABE). Run it like so. python crcFilePatcher.py --file=MovieName --newcrc=deadbeef. I've pushed all the files onto github here. Have fun :)

Buggy Symbol to Proc Implementation

2009-09-07T02:47:00.000-07:00

Ruby 1.9 seems to have introduced symbol to proc into the core language itself. I'm not sure whether this is implemented natively or as Ruby code, so if anyone knows that, please do let me know. For people who don't know what symbol to proc is (which is, I suspect, a very small part of the ruby community), click here

Recently I was doing something on irb 1.8 and I missed having symbol to proc around. Rather than require ActiveSupport, the gem which includes this enhancement to symbol, I just googled for the code and found this, which was the first hit in google when I searched for "symbol to proc". I pasted the code into irb and continued ... and then I suddenly faced a strange error.

class Symbol
def to_proc
Proc.new { |obj, *args| obj.send(self, *args) }
end
end
>> [[1],[-3,4]].map(&:size)
ArgumentError: wrong number of arguments (1 for 0)
from (irb):58:in `size'
from (irb):58:in `send'
from (irb):58:in `to_proc'
from (irb):71:in `map'
from (irb):71

I was quite mystified by this error. Now this does not happen when you use the symbol to proc implementation currently in Rails (listed below), or the Ruby 1.9 implementation.

class Symbol
def to_proc
Proc.new { |*args| args.shift.__send__(self, *args) }
end
end

Modifying the buggy symbol to proc implementation a bit, we get more insight into the problem.

class Symbol
def to_proc
Proc.new { |obj, *args| puts "obj is #{obj}"; puts "args has #{args.size} elements"; obj.send(self, *args) }
end
end
>> [1,-3,4].map(&:abs)
obj is 1
args has 0 elements
obj is -3
args has 0 elements
obj is 4
args has 0 elements
=> [1, 3, 4]
>> [[1],[-3,4]].map(&:size)
obj is 1
args has 0 elements
obj is -3
args has 1 elements
ArgumentError: wrong number of arguments (1 for 0)
from (irb):58:in `size'
from (irb):58:in `send'
from (irb):58:in `to_proc'
from (irb):71:in `map'
from (irb):71

Notice how in the second case, obj seems to be not the array, but the head of the array. And the remaining elements have become arguments. I'm not sure if the symbol to proc implementation on that page was always wrong, or if ActiveSupport once had a buggy implementation that was subsequently fixed. One thing I've been unable to do is to write some sample code to recreate the problem without using symbol to proc at all. Just writing a couple of functions with varargs and invoking them with array arguments. If anyone is able to construct such an example, that would be great. Here is an additional issue with symbol to proc (performance related). And here is a workaround.

The Primitive Obsession in Ruby aka Don't make EVERYTHING an Object

2009-09-04T14:10:00.000-07:00

When I was playing around with the sudoku solver I fell into several traps. One of these was a little insidious. Back in the day, I thought EVERYTHING had to be represented as an Object. (yes, I know everything in ruby IS an object, I just mean you don't have to create your own class for every concept). So one of the first objects I built was a position object. Every cell had a position and a value. So position was a little object with an x and a y value, representing it's coordinates. It could give me positions of neighbouring cells, so that was it's behaviour.

Now, one of the simplest methods of crafting the solver is to say, pick a cell, find all empty neighbours, and then figure out what to play in them. So I had some code of the form cell.position.neighbours.select(&:empty). But I had defined neighbours as row positions + column positions + block positions. So I was bound to end up with a lot of duplicate positions. This wasn't a problem per se, but it was bound to slow down the solver. So I decided to make it cell.position.neighbours.select(&:empty).uniq. But ofcourse, this didn't work. I had to make sure that 2 positions were actually equal if they had the same value. That was simple, we've all seen it in basic textbooks.


class Position
 def ==(other)
  other.equal?(self) || (other.instance_of?(self.class) && other.x == @x &&    other.y == @y)
  end

  def hash
    @x*29 + @y
  end
end

But that didn't work. Turned out there is == for equality, but there is also eql?. Which is used in arrays and hashes while comparing objects. (The use case for eql? being different from == seems obscure, but it apparently ensures that 17 == 17.0 but 17 is not eql? 17.0).Anyway, that was easily fixed. I just had to override eql? and call ==


class Position
  def eql?(other)
     self == other
  end
end

Well it worked, but ... holy crap. Performance was abyssmal. Earlier Ive posted solver times as being in the order of several seconds. But initially the solver took several minutes. It took nearly 20 minutes with the constraints checking solver. I was quite shocked. So I tried various tricks including profiling the code, and it turned out MOST of the time was spent in uniq ... and withing that in my eql? definition. Redefining eql? killed the system. So I ended up modifying my objects so that I could depend on pure reference equality and suddenly everything was blazingly fast again. Later ofcourse, I got rid of position. But the lesson I took away was, don't override eql? and == in Ruby, unless you have a damn good reason.

hasOwnProperty to the rescue

2009-08-31T22:06:00.000-07:00

Turns out the solution of how you can use objects as hashes and still introduce behavior in stuff like Object.prototype is to always check a method called "hasOwnProperty". This specifically tells you that a property on an object is its own and not derived from it's prototype. For example, let's say we have


  Object.prototype.baz = 'baz'
  function explode(divId, thingy){
       for (item in thingy){
          show(divId, item);
       }
   };

So we can't just use for ... in. But instead


  function ownExplode(divId, thingy){
       for (item in thingy){
          if (thingy.hasOwnProperty(item)) 
            show(divId, item);
       }
   };

and there we go. There is a small price to pay in terms of ugliness, but in return, we can add stuff to Object.prototype with impunity

Whoops, associative arrays? You mean Object?

2009-08-30T00:19:00.000-07:00

Last post I mentioned that extending Array breaks javascript associative arrays. But as Daniel Bodart pointed out and can be confirmed here and here, there's no such thing as an associative array. It's just a javascript object. It's only for historical reasons that Array has been used for this purpose. When I was first reading about javascript I remember reading about arrays and associative arrays and it stuck in my head. I never realised that these were just objects that just happened to be using Array.

There is one surprising thing about this though. Calling for a in b on an array or Array.prototype does not reveal length or any of the inbuilt properties or fuctions.

Length is missing from the list of properties. And so are all the other predefined properties. But this is true of all Objects. In a similar vein Object.prototype.constructor reveals Object, but dumping everything in Object.prototype does not reveal the constructor property.

I guess this makes it clearer. Associative arrays being a special javascript construct is some sort of useful fiction that's used as a teaching aid. Once it's discarded you see that adding special methods to Array only breaks a misuse of Array. On reflection my continuing to believe Associative Arrays were a special construct is akin to learning all the laws of physics and still believing in Santa Claus. I guess the question now is, how do you add methods to Object.prototype without breaking the simple hash usage?

why you should not add stuff to Array.prototype in javascript

2009-08-29T04:35:00.000-07:00

Why should methods not be added to the Array prototype in Javascript? The main reason is they break associative arrays, those where the index you use does not need to be a number, but could be anything. Kinda like a hash.

With an associative array you can do stuff like

for (item in array)
       alert(array[item])

Methods added to arrays prototype will show you up in the for a in b call. Here's an example


Array.prototype.eachIndex = function(func) {
for (item in this)
{
    func(item)
}
}

function bar() {
a = [1,2,3]
div = document.getElementById('foo')
a.eachIndex(function (i){div.innerHTML += ' ' + i + ' : ' + a[i] + "

"})
}

Whats in a?

So there, I hope you've learnt your lesson!! Ofcourse you could decide associative arrays are pointless and I can manage with javascript objects thank you very much :). Did you notice peek which has been added to the array, presumably by blogger?