Saturday, September 12, 2009

DEADBEEF for the CAFEBABE : (otherwise known as) The Great Choose Your Own Crc32 Adventure

     Ever since people have been transmitting information, there have been mechanisms to ensure that the transmission was successful and the received information was what was transmitted. On the internet at various layers you have some amount of redundancy and error checking. One such popular approach of verifying transmissions or data is the checksum. One really popular checksum is the Cyclic Redundancy Check (CRC). Various CRC's are used in a variety of places like reading from cds, verifying that a zip or a rar archive has been opened correctly or transmitting files.

    For a long time, I've been involved in fansubbing anime and distributing it online. Before bittorrent, which now allows people to almost broadcast files and involve many people in the effort, it was necessary for people to distribute files one at a time. People used to distribute files on irc, host it on websites and ftps. Very often you'd have received a file as the tenth or even the hundredth person in the chain. There was always a small chance of corruption and over time you were quite likely to receive a file that was corrupt. In order for people to ensure that the file they received was accurate, the practice was for the group to publish the CRC32 of the files they released. A crc32 was really convenient. It was an 8 character hexadecimal string, which means it's easy for humans to read. But if your file was corrupted, there was only a 1 in 2^32 chance that you wouldn't know about it. Fairly good odds.

     Ofcourse, the crc32 is not a cryptographic hash. It's possible to reverse it or to create junk which has the same crc32. The idea was not to prevent sabotage, but accidental errors. Since crc32 was reversible, anarchriz wrote an article about how to reverse a crc32. In around 2002 or 2003 there was a brief fad in the fansubbing community. Someone had written a small program to modify files to make sure it had any chosen crc. So groups would modify and release files with crcs they thought would look cool. 

    Last year (2008), the group I'm a part of,  Live-Evil, thought we'd release an old show called Kimagure Orange Road. Ken Hoinsky our fearless leader though it would be fun to sort of do a retro thing. One of the ideas we had was a custom crc for each episode. The first episode would be abcb0001, the next abcb0002 and so on. (The ABCB coffee shop is a big part of KOR). He managed to dig up anarchriz's document so we started poring over it. It looked doable, but we didn't want to do the work of actually writing this code and testing it. Surely it had already been done. We couldn't find the code that had been used previously. But we did find this article by Bas Westerbaan where he had implemented this algorithm in python. Perfect! So we downloaded it and ... it didn't work.

     That was odd. We'd patch files according to the documentation, but when we checked the crc of the file it was not what we'd tried to patch it to. So I started looking at this code to see if it was easier to make it work rather than write my own. Turned out there was a small mistake. We fixed it and the code worked fine. So after communicating this back to Bas, we were merrily on our way. Each episode of KOR has since had crcs of the form abcbxxxx.

     I've put together a tool in python to take a file and patch it to whatever crc you choose. I first tried to write a pure python crc calculator, but it was uber slow on large files. Looking around I found that most people delegate the crc calculation bit to C. So I downloaded this open source python project called cfv and lobotomized it heavily to the point where all it does is calculate crc and nothing else. I wrote a small python script to call both the cfv script and bas's crc reversal script and patch any given file to any crc you choose. Now, how does the patching work? It calculates what sequence of 8 bytes should be appended to the given file so that the crc is whatever crc you have chosen. Most media files like movies, music even pdfs ignore any garbage at the end. So all it does is append a few junk bytes to the end. 

    To use it you just pick a fun sounding crc. (Canonical examples are DEADBEEF and CAFEBABE). Run it like so. python --file=MovieName --newcrc=deadbeef. I've pushed all the files onto github here. Have fun :)