Thursday, January 31, 2008

peeve no. 255 is an API call changing global state

Ooooo, there's nothing quite like spending most of the workday trying to track down the root cause of a single problem.

Except for when the root cause turns out to be an API call changing global state.

It's like the spammy counterpart to the inversion of control technique. "You know how convenient it can be to define an application's callbacks/delegates/listeners, and just let library code handle the Big Picture as much as it can? What if the library code also manipulated the application code's variables and resources on its behalf, without even being asked!" It's an idea so innovative it'll prompt you to smack your forehead in response. Or smack your forehead against a desk. Repeatedly.

It's like the code equivalent of Yzma's frustration with Kronk. "...why did I think you could do this? This one simple thing. It's like I'm talking to a monkey..." I ask my API for help with one task, the task the API call is clearly named for. I provide the API call with what it needs to carry out the task. And at execution time, the API call then proceeds to surgically cut a piece of data, which sends the entire program crashing down like a chandelier.

Testing it is like repeatedly stepping on rakes lying on the ground. Each time the code runs (stepping on the rake), failure seemingly comes out of nowhere (the handle) to bash you. So you try changing some of the application code, merely to see if the change might make a difference. Thwack. Grab your stress-relief squeeze ball ("that little sucker just saved your life, monitor!") for a moment, adjust a different section of the application code, try again. Thwack.

The Wheel of Morality for today contains the following tips, any of which the spin could land on: 1) minimize global state by whatever options are available, 2) minimize the number of incidental side-effects per procedure and document the side-effects that remain, 3) minimize the general "action-at-a-distance" quality of all code, 4) minimize the assumptions a reusable library makes about the application, 5) minimize surprise without introducing inconvenience by somehow providing multiple library code paths--for instance, by including an optional DoMyWorkForMe parameter, a separate super-deluxe FunctionWithStateMunging, or an OptimizedForYourConvenience façade object.

As for my situation, in which the offending API is written and maintained by a vendor and distributed only in bytecode form (say hello to my little friend, the .Net Reflector), 6) frakking minimize the frelling respect and tolerance and money for that kriffing vendor.

Tuesday, January 22, 2008

Corner Gas

Thank you, writer's strike. The resulting dearth of original TV programs without suck has freed up some time for me. (I don't mean time during the actual broadcasts. Before and during the strike, my KnoppMyth installation, combined with the Schedules Direct service I've already complimented, has been recording on my behalf.)

I've spent part of that liberated time...watching TV, naturally. The "Superstation" WGN channel has been airing episodes of Corner Gas cable-style: filling up schedule gaps with whatever is available, no matter how old it is. Unlike much of the filler, Corner Gas is actually new to me, and it's well worth catching each (self-contained) episode.

Corner Gas is a Canadian show through and through, set in Saskatchewan. The references to Canadian culture, as well as the (unexaggerated, no "eh") accent, aren't serious obstacles. Its tone and characters are well-described by a one-liner seen around on the Web: "Seinfeld rocketed back 40 years and put in Mayberry". Mayberry is the fictional small town in The Andy Griffith Show.
  • Seinfeld's plot lines centering around the minutiae of everyday life earned it the description of "a show about nothing" (a description also applied to the show by itself, through the self-reference of the sitcom "Jerry") . Nevertheless, Seinfeld's setting was New York and it included a wide range of notable guest characters to produce drama and conflict. Corner Gas' setting is a tiny rural town, primarily a gas station and a diner, and the guest characters in any given episode (like, say, the real Prime Minister) are for gags, not advancing the plot. If Seinfeld is a show about nothing, then Corner Gas is a show about less than nothing. Then again, like any sitcoms, the situations mined for comedy can be outlandish, i.e., not really "nothing".
  • Corner Gas' humor is similar to Seinfeld's, too. In some ways, it's more "Seinfeldian" than Seinfeld was! Seinfeld used slapstick, outrageousness, and crudeness fairly often. Corner Gas doesn't. Seinfeld had sarcastic dialog about any insubstantial topic. So does Corner Gas. The jokes come at the characters' expense. A lot of the time, the maligned character is completely unaware of the joke.
  • Seinfeld and The Andy Griffith Show were alike in not being easily categorized as "home" or "workplace" comedies: only some of the exceedingly quirky characters were family or coworkers. Corner Gas is the same way. Its offbeat characters also are the primary basis of its humor, although in my opinion these characters typically are relatively more plausible than either the neurotic/despicable examples on Seinfeld or the stupid/naive examples on The Andy Griffith Show. Of course, sitcoms are under comparison here, so in these shows realism is one of the first qualities sacrificed for the sake of funny.
  • One technique which makes Corner Gas more distinctive is its quick and abrupt cuts between reality and, well, unreality. The unreal scenes might be a vivid character daydream, a past event just referenced in dialog, or a bizarre gag otherwise cued by dialog. The one that comes to mind is when a character mumbles that a clown who caused pain would be "Painy the Clown", and then the show immediately switches to showing a scene, no more than four lines long, of "Painy" scolded by a customer then responding by attacking the customer. This approach has led to comparisons between Corner Gas and Scrubs or Family Guy. However, Corner Gas relies on it far less.
If it isn't obvious, I'm deeply enjoying Corner Gas! The show may not be flashy or exciting enough for some viewers, but compared to others in its class it's above-average at all times and periodically it's creatively magnificent without being overbearing. It's written intelligently, yet it isn't pretentious. Given that TV Guide once named Seinfeld the greatest TV show, and Corner Gas has been called the Canadian Seinfeld, it has a good chance of finding new fans for a very long time.

Saturday, January 19, 2008

the pigs are walking?

If the title doesn't make sense, I highly recommend reading Animal Farm. When the farm animals overthrow the farmers, the pigs are the leaders. By the end of the book, the pigs have started walking; a grand revolution of government has ultimately produced leaders just like the old. In the Lost episode Exposé, Dr. Arzt states that the pigs are walking when he's questioning the (de-facto) leaders.

In this case, I just read a commentary piece which at one point gives several examples of huge, monolithic, feature-bloated, overly-complex programs. One of them is Firefox.

But as people may or may not remember, Firefox (er, Phoenix) first became popular as a light, focused alternative to Mozilla (now SeaMonkey). (Actually, at that point in time I was happily using Galleon--never used Epiphany either.)

This isn't really a sighting of "pigs walking" because the commentary piece is referring to the perspective of developers of POSIX-ish systems, not the perspective of a user. And speaking as a user, if Firefox adds features to make my browsing better without hindering me from my most common tasks, I won't complain.

Friday, January 18, 2008

public bulletin: open source software is heterogenous

I was just reading some commentary (not linked because there's no shortage of commentary available) about Sun's purchase of MySQL. I feel that I should do the responsible thing and get the word out.

Open source software is heterogenous.

Let me explain before this statement leads to any humorous misunderstandings. FLOSS is heterogenous because it's a conceptual clumping of any software which is available under the right license(s). Similarly, "LAMP", Linux/Apache/MySQL/(Perl|PHP|Python|PRuby), isn't a product. It's a collection of software. And as more and more people know, what's commonly called the "Linux platform" is also a collection of software, around the Linux OS/kernel. One of the real-but-under-publicized advantages to running FLOSS is how easy and convenient it can be, nowadays, to mix and match the software.

All this is enabled by the common denominator between FLOSS: the licenses. MySQL's license (er, one of them) ensures it will continue to remain available for the software combinations it's currently in. Speculation about how Sun's purchase of a company will affect "open source" is incredibly vague, because FLOSS isn't a company nor an industry nor an organization. "Movement" is closer, but even that erroneously suggests cohesive goals crossing across all the participants. Since it's a matter of licensing and collaboration, perhaps it should be a verb, not a noun. FLOSS is an activity some software-developing entities do. Even the company often filling the boogyman role, Microsoft, has been doing FLOSS more.

But by all means, Web commentators, continue to muse on the Sun-MySQL deal as if it will drastically upset those using FLOSS. (I admit that when companies pay developers full-time to work on the software, that makes a difference.)

Thursday, January 17, 2008

defining information and data: the economics of information, part 1

Other parts (since blog entries are consumed in reverse-chronological order, the parts are posted in reverse-chronological order):

Part 2, valuing information
Part 3, trading information

Disclaimer: I'm not an actual theorist or scholar of anything. If any of the following series seems derivative, that's because it is. Moving along...

introduction

Per usual practice, I'll use the Thomas Jefferson quote as a preface:

If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it. Its peculiar character, too, is that no one possesses the less, because every other possesses the whole of it. He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me.

Techdirt's "The Grand Unified Theory on the Economics of Free" (and its thesis-defense comments) prompted me to ponder some general questions about information, especially digitally-encoded information. What is it? What is its value, and how is that value derived? How could the answers to those questions tie together into economics, whether in conventional terms or the terms Techdirt's article laid out? I proceeded in careful steps, philosophy-style. The result is loooong but divisible by topic, so it's a series.

defining information and data

As Jefferson hinted, information (what he calls an idea) is fundamentally different from matter. Specifically, information can only be manipulated by a mind. As many minds exist, so can many copies of information exist. Rather than attempting to define "mind", and potentially become mired into a discussion about the nature and origin of consciousness, I'll resort to circular definitions: "mind" is defined as that which understands and manipulates information, and "information" is defined as the material which minds understand and manipulate. (Someone may break in at this point to protest that human minds also manipulate human bodies. Wrong! Brains manipulate human bodies. Note that I'll also not consider the relationship between the brain and the mind--I don't wanna. However, for extra credit, the reader may ponder this oft-explored hypothetical: given two humans, how much about them must be identical for them to have identical minds? Identical synapses? Identical bodies? Identical experiences? Identical surroundings?)

The next step is to recognize something else the Jefferson quote expresses: a mind can either hoard information, or it can communicate the information to another mind (letting the brainwaves ripple, so to speak). Communication is necessary because minds are disconnected, separate. (By the above circular definitions, two or more minds could only be considered a single mind if all the minds understood and manipulated the exact same set of information, which is not the case. I am Hugh.) Since minds are separate, the separation material must consist of matter. Therefore, communication has multiple steps: 1) the sending mind encodes the information into matter, 2) the matter travels to another mind, 3) the receiving mind decodes the matter into information. Nothing in this model is controversial, aside from being a blatant oversimplification compared to real communication theories/models. For clarity's sake, "data" shall refer to the matter representations of information, used in communication.

In passing, for the sake of full disclosure, note that real communication is implicitly imperfect, ultimately stemming from the difference between, or the boundary of, mind and matter. (One might say, "the map is not the territory".) Perhaps the original purpose of minds is to do what matter on its own cannot: apply information to other information, "annotating" new information with meaning, forming context, birthing information of greater worth than the individual bits. To communicate is to dislodge a subset of information from outside of its original context within a mind. The very first step of information encoding is lossy. The very last step of information decoding is extrapolation. Strictly speaking, mental information (as opposed to sensory information recorded almost perfectly by technology) isn't duplicated; it's torn out, encoded to data with approximations, sent, decoded from data approximations, and interpreted. "Reproduction" of information, via a communication channel of data, is not really a "magical" loophole of conservation laws.

Data's most basic form is personal interaction such as gestures, vocalizations, and speech. A variety of creatures create and consume this form of data through a variety of methods, but its expression peaked in humans: spoken language, which has stunning complexity. Advances in communication have at least partly been refinements in data: pictures, stylized/standardized pictures, phonetic pictures, characters/alphabets, the printing press, photography, the phonograph, the telegraph, the telephone, radio, cinema, television, modems/faxes, broadband networks.

Most importantly, data manipulation changed forever once humans realized that by translating or encoding data into a digital (binary-series) form, meaning a succession of either 0/1 or off/on or no/yes, any technology which can handle the digital form can therefore handle the data. And since the presence and absence of an electric current are excellent ways to represent 0/1, the data could also be "electrified", which enables quick transmission, storage, and modification. Experts may smirk at someone who simplistically asks "how to quickly show, to my relative in the other half of the country, a picture I took from my new itty-bitty camera" or at someone who asks "how many songs can my portable music player contain", but that level of abstraction better illustrates the profound communication effect of digitized data. On some level, all websites, within and without the Web 2.0 bubble, are "social".

Digital data's technologically-achieved processing efficiencies naturally extend to yet another operation: copying, which is just retrieving, transmitting, and re-storing the data. Moreover, unlike both the manual and mechanical copies of past forms of data, the copies can be identical: all that's necessary is reliably duplicating each 0/1 in the series, by repeating the procedure bit by bit. For humans, who really only care about the information rather than the data, it would be an excruciating way to copy, but it's ideal for the purpose of automation. Besides greater speed and accuracy, digital data copying also has become quite cheap in cost. The number of 0/1 bits per unit of currency keeps increasing across the range of all devices, while the cost of each data operation is tiny (the cost as in watts of power and common device wear and tear).

Essentially, digital data resembles information more than any previous form of data, and therefore the machines that manipulate the digital data resemble the mind in the same ways. Given this is the case, the urge to apply the Thomas Jefferson quote to data as well as information is not surprising. Also, digital data prompts a reevaluation of the economics of information as a resource or good in itself, rather than just as a factor in economic decisions.

valuing information: the economics of information, part 2

Other parts:

Part 1, defining information and data
Part 3, trading information

Information has much value, although its value is harder to grasp than the value of other goods. Of course, the same key economic principle applies: value is assigned by a market. The market for information is minds.

Minds value information in the same way any economic agent values any good: for what it achieves. The question of what an economic good, especially information, achieves can be subjective and loosely-defined. If someone says a story made him or her laugh, cry, and think, then the information has definitely achieved something for him or her, and it has proportional value. If someone learns how to play pinochle, then that information has increased his or her capabilities, and it has proportional value (i.e., depending on how much he or she will be playing pinochle). If someone legally hears of a stupendous stock tip, then that information has substantive value indeed. If a spy discovers a long-planned surprise attack from an opposing army, the value is measured in the preservation of human lives (from the perspective of one of the sides, anyway).

All these values happen on the consumer or demand side of the market, but in any (semi-free or more) market, the value has a corresponding producer or supply side. Just as value on the demand side of the information market varies widely, value on the supply side of the information market varies widely. Of course, for the producer, value means cost. Information production has costs. R&D is an example that has price tags. Basic science research has price tags, too--supercolliders and space telescopes aren't free. Less obviously, the creation of "fuzzier" information by artists, writers, engineers, architects, etc., have costs. The creator's time and exertion are far greater than zero.

Another supply-side value which deserves more attention is the expertise of the information producer. Rarity raises value, and the expertise involved in producing worthy information is relatively rare. In my opinion, the spike in amateur videos has confirmed this all too well. For every truly noteworthy piece of creative information, a multitude of mediocre to awful creative information also exists. Still more rare is the information producer whose expertise results in consistently-high output.

So information undoubtedly has value. But the data, the material manifestation of the information which can be seen and felt and heard, has its own components of value. One is fidelity: how well the data reflects the information. Low-fidelity data, like a scratched vinyl album, or a grainy filmstrip, has lower value than high-fidelity data for the same information. A second component of value is utility: how easily, flexibly, etc., the data, and therefore the information represented by the data, can be manipulated. Data which requires specialized devices to read it has lower value than universal data. The point is always the information, not the data. Fidelity and utility describe the degree to which the data doesn't hamper the communication of the information.

Once again, digital data's technologically-achieved processing efficiencies have changed data to more closely match information. On the consumer or demand side, digital data is increasingly easy and cheap to obtain and experience. For instance, not many years ago, communicating a large quantity of video data via priority mail was quicker and cheaper than trying to send it through a line, but more recently the decision has flipped. In the case of data like text, the digital form's proposition was stronger almost from the start (recall the excitement over having an entire encyclopedia on a data CD, including "multimedia"?). The growing ease of consuming data, particularly on an interactive or on-demand basis, in effect makes the data market itself larger and more efficient. In simple terms, this just means that the consumer has more choices. A common term for this is the Long Tail.

Meanwhile, digital data has had similar effects on the producer or supply side. Through the assistance of technology (without which the digital form would be quite useless), data is subject to any modifications of arbitrary precision. People can create new experiences of sound without picking up an instrument, munge pictures, and computer-generate movies and TV shows. Moreover, the technology for doing these tasks has grown steadily cheaper, allowing more potential producers to become real producers. (For instance, anyone can publish through a blog, or calculate finances with a spreadsheet.) As on the demand side, the growing ease of producing data has made the market itself larger and more efficient.

Ultimately, information will remain valuable on both the demand and supply sides. Its market won't vanish. As data better serves its function of communicating information, it will give way to the underlying market, the information market. More to the point, continuing to target the data market rather than the information market will become steadily irrelevant. However, the information market comes along with its own set of conceptual shifts.

coda about software

Software is a unique case of data, in which the data represents information about manipulating data. For clarity's sake, the data manipulated by the software shall be called "work". (Of course, in practice the software is separated from the work for security and efficiency, but all data is nevertheless stored and retrieved in the same general way. Well, software and work have separate caches.) Since the manipulation of the work is the motivation for using the software, the value of the work is thus intrinsic to the software's value. In the opposite way, since software creates and modifies work, the software's value becomes intrinsic to the work's value--if the work can't be manipulated, its information is lost, at least without conversion.

The trend to transform the data market ever further into a purer information market will apply pressure to any obstacles which would trap information into data. Consequently, the value of software will lie in how much of the information market it can manipulate. The value of work will lie in what degree it can be treated like information--that is, the degree to which it can be easily manipulated. Minds don't need apparatus like software to manipulate any ideas. Similarly, ideas are much more fluid and interconnected than work, the actual bits of digital data. A second fusion between software and work will probably occur, but not in the rigid co-dependent way as before: "smarter" data, with standardized, replaceable parts. Dare I say Semantic Web? Or the Cloud?

trading information: the economics of information, part 3

Other parts:

Part 1, defining information and data
Part 2, valuing information

Economically considered, transactions in the information market have some unusual properties, due to information not being matter (recall part 1, each time communication occurs, the receiving mind decodes the data and "recreates" the information): 1) the producer's "inventory" of information isn't depleted, 2) the consumption of the information doesn't eliminate the information, 3) the consumer is immediately able (physically, mentally) to use the information in future transactions, this time as a producer.

Taken together, these unusual properties lead to a predictable outcome: the more transactions in the information market which occur, the greater the supply of the information becomes, on a compounding scale. Old-fashioned hearsay, rumors, and gossip can take less than a week to become common knowledge. New-fashioned blogs commenting on blogs produce an "echo chamber" (I WILL NOT use "blogosphere" non-ironically!), rocketing a particular entry to popularity...or notoriety.

Since the data market approaches the information market as data's efficiencies, particularly digital data, approach those of information, the economics relied upon by participants in the data market must adjust. Fortunately for producers who wish to eat (as opposed to having their lunch eaten), information retains the same value it always has had. There isn't a set formula of precisely how to adjust, but some suggestions and observations pertaining to the information market are available. Generally speaking, each one is aimed either toward leveraging a lack of scarcity or introducing scarcity, so the high level of overlap is unsurprising. Naturally, no guarantees are included.
  • Produce information whose value to the producer increases as the information is consumed. The traditional example is advertising. "Viral" marketing and product placement are some other techniques.
  • Instead of producing information, produce tools and services which empower consumers to manipulate information, whether creating, modifying, storing, sharing, securing, selling. Admittedly, the more these tools and services are offered for free, and the stronger and easier those free tools and services become, the probability of earning a profit here will shrink.
  • Ensure that the original producer of the information continues to be the best source for that information. Recommended tactics are offering the most convenient transactions (greater hassle for the consumer equates to greater perceived cost), frequent updates to keep past versions of the information less desirable (note that even information takes nonzero time to indirectly spread, depending on the number of intermediaries/layers), extra benefits to consumers who use the original source (the benefit could even be additional information), greater context for the information.
  • A more efficient information market means that more choices are available to the consumer. His or her time and attention are limited. This immutable fact presents an opportunity for tools and services that could aid him or her in finding and discriminating information. An information market has less profit in distribution (how expensive is it to distribute something that's ethereal?), but it has more profit in filtering.
  • On its own, information is inert. "Interactive" information is cleverer at not seeming inert, but only up to a point. By contrast, the greater value of interacting with a mind is hard to overestimate, which is why education will continue to have a deep need for teaching professionals in the face of widespread information. (Of course, some kinds of information are hard-to-impossible to communicate apart from the immediate feedback of a "coach".) A consumer can reuse information in another transaction; he or she can't take and resell the source mind.
  • The value placed on information, like the value placed on any other resource, has a subjective component. Information producers can drastically exploit the subjective component because information is extremely malleable at low cost. Customized information is simultaneously of higher value to the targeted consumer and of lower value to every other consumer. (Then again, the same malleability potentially makes it easier for secondhand consumers to un-customize and re-customize the information.) Significant customization assumes a corresponding amount of work by the producer, though.
  • Joint ownership of conventional goods illustrates that simple possession isn't the sole measure or benefit of ownership. So information producers can still trade "ownership-like" privileges for a shared possession like information. For instance, like a stock investment, information could be "pre-sold" before production--the obvious difference being that the investing consumers are expecting a return in information, not currency. (Clearly, the "investors" would need to be convinced that the information will be worthwhile.)
  • Use a pricing strategy opposite to that of mass production: because large numbers of transactions can't be counted on (due to the potential of cheap duplicates), make up the difference through setting a higher price per transaction. Or go with a "group rate" paradigm, in which the high price could be split among several consumers, but each of the consumers in the group receives the information.
  • Finally, economic behavior isn't independent of other behavior. More abstract incentives, positive or negative, can affect decisions. Societies function on incentives like those.
summary

A top factor in the dominance of the human species (over other lifeforms and to some degree Earth itself) is its talent for information. Over time, the matter representations of information, data, have grown in efficiency by the advances of technology. Data increasingly resembles information, and people will treat data more and more like information. Thus, the market for data will approach a market for information. Information will remain a market because, though the efficiencies of manipulating data enable large-scale (& decentralized) information production, good information will still have value. Nevertheless, the information market has its own set of challenges for yielding a profit. Those challenges aren't new, but take on more significance.

Tuesday, January 08, 2008

thinking in diffs for svn

Wait, why discuss subversion?

Generally, the topic of revision control doesn't interest me. Revision control falls into my "Meet my needs, but don't bug me!" category of tools, rather than my "Meet my needs, and enthrall me with your incredible additional features!" category. (I consider Spring an example of the latter category. Oooo, look, shiny AOP to go with my DI!) And as others have observed, if someone hasn't been investigating revision control or contributing to FLOSS on personal time, the chances aren't good that he or she has ever used the revision control up-and-comers; companies are typically hesitant to switch revision control systems for three understandable reasons:
  1. Code, both past (meaning its revision history) and present, is a highly valuable asset, which the company's destiny may be riding on. Therefore, the revision control system must be trustworthy, well-supported, etc. One way to cover those bases is to pick the "safe" choice, not the "bleeding-edge" choice. As an aside, I think open-source software has an edge here, because vendor lock-in would clearly be a Bad Thing. What would happen if the revision control system vendor decided to double the cost of the next upgrade (due to a wealth of brand-spanking-new features the company never plans to use), and to set a support end-of-life date for the current release? Not a rosy situation.
  2. Switching the revision control system has great costs. From the effort of the actual process of the export/import (although my impression is that this is improving) to the development downtime to the need for different (IDE, build system) integration tools to the productivity drop for retraining, the projected benefit of the switch must be significant.
  3. In the case of a company with special needs, or just being keen on using the "best" revision control system (such as a fast-moving start-up eking out competitive advantage), the cost of sufficiently learning and experimenting and testing the sheer number of candidates is also a factor.
My experiences reflect the above. I've used RCS, CVS, and subversion. And my usage of CVS and subversion has been primarily through a graphical interface, not the command-line. I have read about other revision control systems. (For me it's similar to a spectator sport, like Blu-Ray/HD-DVD. By my reckoning, mercurial scored well in 2007. Git attracts attention, too, but mostly in non-Windows projects.)

Whatever, get back to your topic!

I'd like to share a revelation that I've had during my time here in svn-land. It came to me when I tried to understand svn's merges. I realized that this command is not actually merging. Every merge in revision control systems is for combining part or all of a branch with another branch in the surrounding repository, but svn's merge does not. An svn branch is cheap-copied to another location as a diff of the original, and it diffs, and diffs, until its development has diverged too far from the original. The only way the branch can again become like another branch (or the trunk) is to apply another diff.

Therefore, thinking in diffs resolves some possible confusions about the use of svn. I've read that darcs has a central concept of the "patch" and git has a few central concepts like the "index" and hashed "objects". Subversion's wikipedia page mentions that subversion is described as a "three-dimensional filesystem": a tree and diffs to that tree. Each revision number in subversion is thus a label for a particular set of diffs. Moreover, as an automatically incremented counter, the revision number is just a convenient way to keep revisions in time order. In my twisted pair of hemispheres, I compare it to entropy: always increasing over time. (On the other hand, I sincerely hope the entropy of the code isn't continually increasing. Are you familiar with technical debt?)
  • When specifying a range of revisions, one uses a starting number before the first revision to be included in the range: to merge revisions 108-112, use 107 as the start. The range of revisions is a diff between the specified start and end revisions. Using 108, not 107, would result in collecting the differences between 112 and 108, but not 108 itself.
  • In some highly-unusual examples, the start revision of the range is greater than the end revision of the range. This may seem strange (negative time?), but from the viewpoint of diffs it just indicates reversing the positions/roles of the compared revisions.
  • As stated before, a merge that ports code changes from one branch (like the trunk) to another branch is just another diff. When one is about to entirely merge a branch into the trunk (or another branch), one starts with the trunk because the goal is to apply whatever differences are present in order to make the trunk match the branch.
I can't imagine this information will be earth-shattering to anyone who knows subversion much better than me, and certainly not to anyone who finds revision control systems interesting, but if it helps someone make the same mental leap without having to delve into any of the fine subversion resources around the Web, I'll have accomplished my purpose.

Monday, January 07, 2008

nobrainers and non-brainers

I'm a pedantic and small-minded jerk for pointing and haw-hawing at spelling mistakes on the Web, but I enjoy it too much to stop. Today's example is using "non-brainer" in place of "nobrainer". Harmless misspelling because it should get the same point across, "self-evident". But entertaining, if the reader misinterprets "non-brainer" as "brainless".

Using an XML file for application configuration may seem like a nobrainer. After all, XML is simple and universally supported.

However, in some scenarios, attributes/annotations are sufficient. In some scenarios at the other end of the complexity spectrum, a static XML document isn't flexible enough to do the job, so doing the configuration in actual code (but nevertheless calling the whole shebang a DSL) is necessary. So while XML configuration can seem like a nobrainer, it may in fact be closer to a nonbrainer.

Friday, January 04, 2008

seemless interface design

From somewhere on the Web:
But look on the bright side, if the only time you hear about your software is when it breaks, take it as a compliment on your seemless interface design.

This is yet another example of what automated spell-checking won't do for you. A "seamless" interface is highly prized because it is a joy to use (no leaky abstractions). A "seemless", meaning "unseemly", interface is not so highly prized.

The moral: work on your seemless interface design until it is seamless.