Tuesday, January 08, 2008

thinking in diffs for svn

Wait, why discuss subversion?

Generally, the topic of revision control doesn't interest me. Revision control falls into my "Meet my needs, but don't bug me!" category of tools, rather than my "Meet my needs, and enthrall me with your incredible additional features!" category. (I consider Spring an example of the latter category. Oooo, look, shiny AOP to go with my DI!) And as others have observed, if someone hasn't been investigating revision control or contributing to FLOSS on personal time, the chances aren't good that he or she has ever used the revision control up-and-comers; companies are typically hesitant to switch revision control systems for three understandable reasons:
  1. Code, both past (meaning its revision history) and present, is a highly valuable asset, which the company's destiny may be riding on. Therefore, the revision control system must be trustworthy, well-supported, etc. One way to cover those bases is to pick the "safe" choice, not the "bleeding-edge" choice. As an aside, I think open-source software has an edge here, because vendor lock-in would clearly be a Bad Thing. What would happen if the revision control system vendor decided to double the cost of the next upgrade (due to a wealth of brand-spanking-new features the company never plans to use), and to set a support end-of-life date for the current release? Not a rosy situation.
  2. Switching the revision control system has great costs. From the effort of the actual process of the export/import (although my impression is that this is improving) to the development downtime to the need for different (IDE, build system) integration tools to the productivity drop for retraining, the projected benefit of the switch must be significant.
  3. In the case of a company with special needs, or just being keen on using the "best" revision control system (such as a fast-moving start-up eking out competitive advantage), the cost of sufficiently learning and experimenting and testing the sheer number of candidates is also a factor.
My experiences reflect the above. I've used RCS, CVS, and subversion. And my usage of CVS and subversion has been primarily through a graphical interface, not the command-line. I have read about other revision control systems. (For me it's similar to a spectator sport, like Blu-Ray/HD-DVD. By my reckoning, mercurial scored well in 2007. Git attracts attention, too, but mostly in non-Windows projects.)

Whatever, get back to your topic!

I'd like to share a revelation that I've had during my time here in svn-land. It came to me when I tried to understand svn's merges. I realized that this command is not actually merging. Every merge in revision control systems is for combining part or all of a branch with another branch in the surrounding repository, but svn's merge does not. An svn branch is cheap-copied to another location as a diff of the original, and it diffs, and diffs, until its development has diverged too far from the original. The only way the branch can again become like another branch (or the trunk) is to apply another diff.

Therefore, thinking in diffs resolves some possible confusions about the use of svn. I've read that darcs has a central concept of the "patch" and git has a few central concepts like the "index" and hashed "objects". Subversion's wikipedia page mentions that subversion is described as a "three-dimensional filesystem": a tree and diffs to that tree. Each revision number in subversion is thus a label for a particular set of diffs. Moreover, as an automatically incremented counter, the revision number is just a convenient way to keep revisions in time order. In my twisted pair of hemispheres, I compare it to entropy: always increasing over time. (On the other hand, I sincerely hope the entropy of the code isn't continually increasing. Are you familiar with technical debt?)
  • When specifying a range of revisions, one uses a starting number before the first revision to be included in the range: to merge revisions 108-112, use 107 as the start. The range of revisions is a diff between the specified start and end revisions. Using 108, not 107, would result in collecting the differences between 112 and 108, but not 108 itself.
  • In some highly-unusual examples, the start revision of the range is greater than the end revision of the range. This may seem strange (negative time?), but from the viewpoint of diffs it just indicates reversing the positions/roles of the compared revisions.
  • As stated before, a merge that ports code changes from one branch (like the trunk) to another branch is just another diff. When one is about to entirely merge a branch into the trunk (or another branch), one starts with the trunk because the goal is to apply whatever differences are present in order to make the trunk match the branch.
I can't imagine this information will be earth-shattering to anyone who knows subversion much better than me, and certainly not to anyone who finds revision control systems interesting, but if it helps someone make the same mental leap without having to delve into any of the fine subversion resources around the Web, I'll have accomplished my purpose.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.