Tuesday, March 13, 2007

conservation of information

I've been working on a project to rewrite an existing piece of custom software for a different platform, as part of a general move from one technology stack to another. (I've written some of my impressions about learning ASP.Net - as well as playing around with F#.) I'm living the truth that whenever a total rewrite happens in software development, information is (should be) conserved - it just changes shape. XML information can become JSON information, and this transformation can even be performed, with some caveats, via an XSLT file; look at the bottom links on http://json.org . In the original, widget X may convey some particular scrap of data, while in the rewrite widgets Y and Z together convey the same scrap. Code that's quite long in one version may become just a few smaller, more effective lines in another version, but the lines express the same algorithm. Decentralized but closely-related code in the first will hopefully be gathered together in the second, yet once again the end effect should be identical. I feel a little like a conspiracy theorist mumbling "See? See? Underneath it all, everything's connected! You're distracted by appearances, but the underlying motivations are what matter! Follow the data! Follow the data!"

Conservation of (program) information is a widely-applicable principle. I knew I'd read something similar, but in that case it was called conservation of complexity. Kurt Cagle wrote it a while ago, and Rick Jelliffe, quoting him from a mailing list, gave the idea a name: Cagle's Law of Constant Complexity, which just says that complexity can be moved around, into libraries or standards/specs for example, but not eliminated.

The task of moving complexity around reminded me of a Zed Shaw blog rant about indirection vs. abstraction that was responded to on "discipline and punish". According to the rant, indirection is for achieving flexibility and extensibility and replaceability in code. Abstraction is for achieving simplicity in the code's interface. Layers of indirection can lead to complex code, so indirection is not abstraction. According to the response, the distinction between indirection and abstraction "is a bit shaky" because indirection can also be for hiding messy implementation details and abstraction is for concretely constraining the problem domain rather than creating simplistic (leaky) generalizations. That is, an abstraction is for defining what can be done with any file-like entity, like open()-ing and read()-ing, as opposed to generalizing a file to a directory entry, which is less helpful because a chdir() works on some directory entries that are really sub-directories and fails on directory entries that are really files.

Conservation of information means indirection and abstraction can't eliminate the information implicit in code, only move it out of the programmer's face. To take a well-known example (since Java is the lingua franca), someone performing calendar work may fiddle with a GregorianCalendar even though he or she may have initially thought a Calendar would have done the trick. Hypothetically, if this separation did not exist but the API could accomplish the same tasks, and a GregorianCalendar was just called a Calendar, the information for Gregorian calendar processing would still be there; it would just be "embedded" in Calendar. Code the programmer doesn't need to write because he or she can utilize a library doesn't mean the code isn't there, just that it's hidden. By the way, there's a JSR to put a Joda-Time-inspired API in plain Java.

For actual information conservation, I advise you to keep it away from black holes. Depending on who you talk to, it may be lost forever if it gets too close.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.