Wednesday, September 28, 2011

evolution and DRY

Reality may be unintuitive. One instance among many is the evolutionary march that yields complexity from chaos, harmony from cacophony, solution from error. At the timescale of a human life, and applied to everyday objects, such a progression is nonsense. In information-theoretic thinking, the introduction of greater randomness to existent information only degrades it with greater uncertainty or "noise", necessitating a communication code that can compensate by adding greater message redundancy. In thermodynamic thinking, the much larger number of disordered states overwhelm the small number of ordered states. Two gases in one box will mix simply because that's far more probable on average than all the undirected gas particles staying apart in the original groupings. In parental thinking, miscellaneous household items won't land in specific designated positions if children drop the items at randomly-chosen times and locations (a four-dimensional vector governed by a stochastic variable).

Clearly, accurate metaphors for biological evolution are lacking. Humans are justifiably amazed by the notion of entire populations of intricate organisms changing for many millions of years. And the changes are related to numerous shifts in myriad factors, including habitat and competition. It's problematic to transplant prior assumptions into that expansively complicated picture.

But a metaphor from software development might be helpful for contemplating the stunning results achieved by the unpremeditated mechanisms of evolution. No matter the particular need served by the software, a solemn rule of development is "don't repeat yourself", which is abbreviated to "DRY". The intent of the rule certainly isn't the naive claim that no repetition ever happens. To the contrary, the rule is about handling repetition correctly. Following DRY is typing a "chunk" of software just once and then somehow rerunning that solitary chunk on different data whenever necessary. The alternative to DRY is duplication. Duplication is often cheaper at the time of initial development, although it's costlier thereafter: two or more chunks are naturally laborious compared to one.

Besides the long-term savings in maintenance work, aggressive DRY has a second effect. The software is divided into chunks. These divisions and subdivisions are easier to read, understand, and analyze. Organization and interconnections take the place of a flat sequence. Appearances suggest conscientious craftsmanship, independent of any knowledge of the software's developers.

Hence, a DRY-compliant outcome has a tendency to look artificially arranged. Evolution could fall in that category. Obviously, unlike in software development DRY can't be a conscious guideline. Instead, inherent constraints "encourage" DRY to occur. Since normally DNA is strongly resistant to massive change, perhaps outright duplication of a gene across separate strand locations is improbable. Reuse of the original gene in a new context accomplishes an identical adjustment in the organism. The DRY-like modifications of the DNA trigger matching DRY-like modifications to the creature. It appears to be the product of a history that's less chaotic than it really is. Thus phenomena that human brains find comprehensible or beautiful, like symmetry or hierarchical containment, arise from frugal genes. So do displeasing phenomena, like unsightly remnants of a contemporary body part's surprising past. DRY pushes for tweaking existing genes and transforming an existing appendage of little value, rather than partial duplication of existing genes and adding more appendages.

Insistent objectors could aver that the DRY metaphor violates the common sense of the conceptual chasm between thoughtful human software developers and thoughtless genetic mutations and transcriptions. The computer programming languages of typical software are specifically designed to enable smaller chunks. Languages incorporate syntax and semantics. How can that be comparable to the simple streams of codons in DNA? Reuse can't happen without a structure to support it. Pronouns are literally meaningless without basic grammar.

Odd as it seems, the genetic code indeed has a grammar. The fundamental building blocks of grammar are symbols that modify the interpretation of other symbols. Here, "interpretation" is the translation of nucleic acids into working cell proteins. Over time, discoveries have shown how subtle the translation can be. It's affected potentially by a host of activity. Genes definitely can adjust the expression of other genes, which is why geneticists hesitate to assign premature importance to single genes. Some of the "words" in this haphazard language are likely grammatical in impact on protein replication, akin to "and", "or", "not". Some might serve both independent and regulatory functions. Incomplete human understanding doesn't cast doubt on the existence or the capacity of evolution's code. It could very well be able to encode the reuse of sections in accordance with DRY-like conservatism. Just as replacing one word in a sentence might have drastic overall implications, replacing a minimal quantity of genes might have drastic overall consequences that give off the impression of evolution acting smart or informative or experienced.

Sunday, September 25, 2011

MythTV archiving nowadays

I haven't mentioned MythTV in a long, long time, largely because I stopped messing around with either the hardware or software in that machine. Some months ago I attempted to upgrade the software but it kept failing before completion. Fortunately, the backup/restore feature allowed me to recover fairly easily each time. Between that difficulty and the extent by which the rest of my devices have since left the machine's hardware in the dust, I'd need to restart the entire project to do it properly. I'm not eager to expend the time, money, or energy for that (plus, the MythTV competitors are so much better now than back then...).

Regardless, it keeps working year after year, so I keep using it. When I seldom overrode the auto-delete of old recordings, my customary procedure for "archiving" was to run nuvexport to convert the source MPEG-2 file, captured/encoded by the PVR-350, into an XVid AVI. The result sometimes contained some bursts of unfortunate side effects, like small to medium "blocking", yet I thought this was a reasonable compromise for the sharply reduced storage requirements. Watching too closely yielded the impression of a video being portrayed by many tiny crawling ants. But quick and well-organized ants, I must admit, especially when seen from twice the usual viewing distance.

Recently, as I kicked off a nuvexport job and felt annoyed once again by the estimated time, I finally recognized the antiquated absurdity. The MythTV machine was put together using rather cheap parts that were current a few generations previously. My main Ubuntu desktop is a more modern computer with the corresponding increases in capability and performance. Moreover, my experiences with streaming services like Netflix or Amazon have reminded me of the advances in video compression. Time to rethink.

Instead, I've switched to transferring the source MPEG-2 from MythTV using the MythWeb interface's Direct Download, so the archiving work can exploit the newer hardware and software of the desktop. I run the h264enc script without many custom answers. The H.264 MP4 files look pretty good at around the same bitrate. And probably due to both the higher clock rate and additional specialized CPU instructions, the process really doesn't take that long to run: the stated output fps rate is faster than playback. This is despite the "nice" priority which prevents interference with all other tasks.

Of course, one "pre-processing" step remains in MythTV; I continue to employ the extremely easy interactive "Edit Recordings" feature (I've never trusted the automatic commercial detection). With the rapid "loss-less" option, the chosen edits produce just a shorter MPEG-2 file, ready for further manipulation elsewhere.

NOTE (Oct 1): Another side effect of switching to a video compression format that's conquered most of the world is that the Roku 2 supports it. But given that I have a working MythTV installation, this hardly matters...

Sunday, September 11, 2011

peeve no. 265 is users blaming the computer

No, user of the line-of-business program, the computer isn't the trouble-maker. It could be from time to time, if its parts are old or poorly-treated, but problems at that level tend to be much more noticeable than what you're describing. Generally, computers don't make occasional mistakes at random times. Despite what you may think, computers are dogged rather than smart. Computers do as instructed, and by "instructed" I mean nothing more than configuring the electricity to move through integrated circuits in a particular way. Computers can't reject or misunderstand instructions. No "inner presence" exists that could possibly do so.

I understand that placing blame on "the computer" can be a useful metaphor for our communication. But the distinction I'm drawing this time is substantive. To identify the precise cause of the issue that you've reported, a more complete picture is necessary. Your stated complaints about the computer's misdeeds really are complaints about something else. The reason to assign blame properly isn't to offer apologies or excuses. Figuring out the blame is the first step in correcting the issue and also in preventing similar issues.
  • Possibility one is a faulty discussion of the needed behavior for the program, way back before any computer played a role. Maybe the right set of people weren't consulted. Maybe the right people were involved, but they forgot to mention many important details. Maybe the analyst missed asking the relevant questions. Now, since the program was built with this blind spot, the issue that you reported is the eventual result.
  • Possibility two is a faulty translation of the needed behavior into ideas for the program. Maybe the analyst assumed too much instead of asking enough questions. Maybe the analyst underestimated the wide scope of one or more factors. Maybe the analyst was too reluctant to abandon an initial idea and overextended it. Maybe the analyst neglected to consider rare events that are not so rare.
  • Possibility three is faulty writing of the program itself. Maybe the coders overestimated their understanding of their tools and their work. Maybe the coders had comprehensive knowledgeable and didn't correctly or fully express what they intended. Maybe a fix had unfortunate side effects. Maybe the tests weren't adequate.
  • Possibility four is faulty data. Like blaming the computer, blaming the data is a symptom. Maybe something automated quit abruptly. Maybe manual entry was sloppy. Maybe the data is accurate and nevertheless unexpected. Maybe someone tried to force shortcuts. Maybe management is neither training nor enforcing quality control.  
  • Possibility five is faulty usability, which faulty data might accompany. "Usable" programs ease information processing from the standpoint of the user. Maybe the program isn't clear about what the user can do next. Maybe unknown terminology is everywhere. Maybe needless repetition encourages boredom and mistakes. Maybe, in the worst cases, staff decide to replace or supplement the program with pen marks on papers or fragile spreadsheets containing baroque formulae. Downfalls in usability may disconnect excellent users from excellent programs.  
  • Possibility six is the dreaded faulty organization, in which various units disagree or the decision-makers are ignorant. Maybe definitions are interpreted differently. Maybe the "innovators" are trying to push changes informally. Maybe the realm of each unit's authority are murky and negotiable at best. Maybe units are intentionally pulling in opposite directions. Regardless, the program probably will fail to reconcile the inherent contradictions across the organization.     
Often, in the Big Picture of the blunder, the computer is the most innocent of all contributors.

Thursday, September 08, 2011

software developers like punctuation

Comparison of equivalent snippets in various programming languages leads to a stable conclusion about what developers like: punctuation. Namespaces/packages, object hierarchies, composing reusable pieces into the desired aggregate, and so on are relegated to the despicable category of "ceremony". Better to use built-in punctuation syntax than to type letter sequences that signify items in the standard libraries. Developers don't hate objects. They hate typing names. Hand them a method and they'll moan. Hand them a new operator that accomplishes the same purpose and they'll grin. What's the most noticeable difference between Groovy and Java syntax, in many cases? Punctuation as shortcuts. Why do some of them have trouble following Lisp-y programming languages? Containment in place of punctuation.

Oddly enough, some of the same developers also immediately switch opinions when they encounter operators overloaded with new meanings by code. Those punctuation marks are confusing, unlike the good punctuation marks that are built in to the language. Exceedingly common behavior can still be unambiguous if its method calls are replaced by punctuation, but behavior of user modules is comparatively rare so the long names are less ambiguous than overloaded operators. "By this module's definition of plus '+', the first argument is modified? Aaaaaghhhh! I wish the module writer had just used a call named 'append' instead!"

Wednesday, September 07, 2011

deadlocks in the economy

The affliction of a specialist is the compulsion to redefine every discipline into that specialty. As someone in a software job, I see the economy exhibiting mishaps of coordination. Programs executing simultaneously, or one program whose several parts execute at once, might interfere and cause confusion. In an economy, legions of economic agents engage in transactions. Given the difference in scale, coordination issues probably are more, not less, applicable to the economy than to a computer. Some of the names, like producer-consumer, even invoke the comparison.

A fundamental topic in program coordination is the "deadlock". Put simply, a deadlock is whenever all collaborators end up waiting on counterparts to act. Say that there are two programs, each of which needs exclusive access to a pair of files to do some work (e.g. the second file might be a "summary" which needs to be updated to stay consistent with the first file after it changes). 1) The first program opens the first file. 2) The second program sees that the first file is already opened, so it naively opens the second file before the first program can. Voila! The first program waits for the second program to finish up and relinquish the second file, while the second program waits for the first program to finish up and relinquish the first file. Everything is "locked" without any way to proceed.

Back to economics. An economy is a massive set of roughly circular flows. Buyers send money (or liquid credit) to a seller, and the seller sends the desired item back. The seller then (possibly) reuses the money as a buyer, and the buyer then (possibly) reuses the item as a seller. If the buyer obtains money in the labor market, i.e. working a job to earn wages, then that's another flow which connects up to this one. These flows continually recirculate during normal functioning.

However, clearly a stoppage (or slippage) in one flow will also affect other flows. This is the economic form of a deadlock: economic agents that halt and in so doing motivate additional agents to halt. Until flows restart, or a newly created substitute flow starts, nobody progresses. No money or items are moving, so each is facing a shortage. Moreover, without the assurance of complementary flows in action, it's in an agent's selfish interest to wait rather than take risks. Therefore everyone waits for everyone to take the first step. Sounds like a deadlock condition to me.

Examined from a high level, deadlocks are clearer to spot. For instance, if the interest paid on a loan for a house is linked to a shifting rate and the rate and the interest both increase, then there could be a lack of funding to cover it. Unpaid interest implies a reduced flow in the money earned by the loan, as well as a corresponding reduced value for the loan itself (a loan without paid interest isn't worth much!). The current and projected reduction in the flow of interest disrupts "downstream" flows that otherwise would've relied on that interest. So the owner of the loan must reallocate money. That reallocated money isn't available for other lending flows. The intended recipients of the other lending flows are left unable to follow their own economic plans. And so forth. Eventually, the original cutoff may come full circle; due to the propagation of effects, larger numbers of loan-payers don't have the flow to pay their interest. The payers can't fulfill the interest payments when lenders have ceased usual risk-taking, and the lenders continue to cease usual risk-taking when payers can't fulfill interest payments. Money is in deadlock. Thus the economy's assorted flows of items (including jobs), which require the central flow of money (or liquid credit) to temporarily store and exchange value within transactions, are in deadlock too. The trillion-dollar question is which technique is most beneficial to dislodge specific deadlocks of money or to cajole activity in general. 

Unlike software, humans are improvisational. Confronted with deadlock, they don't wait forever for the deadlock to break. Instead, they adjust, although it could be uncomfortable. Economic flows that were formerly wide rivers might become brooks. Flows that started out as trickles might become streams. Over a long time period, deadlocks in an evolving economy are temporary. Circumstances change, forcing humans and their trading to change.

Tuesday, September 06, 2011

local minima and maxima

I'm sure it counts as trite to mention that humans aren't great at coping with complexity. (Computers can but only if the complexity doesn't require adaptability or comprehension.) One example is the oversimplified dichotomy between systems: 1) few pieces with highly organized interconnections and controlled variances among the pieces, 2) numerous similar pieces with little oversight that nevertheless mostly act the same and have few decisions to make. An engine is in #1. An ant colony is in #2. A projectile is in #1. A contained cloud of gas particles is in #2. In #1 systems, analysis is rather easy because all the pieces are ordered to accomplish parts or stages of a defined objective. In #2 systems, analysis is rather easy because the actions of all the pieces are generalizable into "overall/average forces". In #1 systems, statistics consist of a series of well-determined numbers. In #2 systems, variances and aggregates are tameable by modeling the population distribution.

The problem is that as useful as these two categories are, reality can often be more complicated. Loosely-connected systems could consist of many unlike pieces. Or the pieces could be alike yet affected in a nonuniform manner by an external disturbance. Or each piece could individually respond to five factors in its immediate neighbor pieces. Or pieces might have ephemeral subgroups which act in ways that loners don't. The possibilities are abundant and stymie attempts to classify the system as #1 or #2.

Consider a minimum or maximum quantity that represents a system. In a #1, that quantity is calculable directly by finding the corresponding quantities for each piece. In a #2, that quantity is an equilibrium that all the pieces yield through collective activity. Either way, the system has one minimum or maximum, and it's reached predictably.

However, this conclusion breaks down when a system is of a "more complicated" kind. Those systems contain pieces that, taken one at a time, are easily understood, but whose final effect is difficult to fathom. As a representation of that system, the minima and maxima could be messy. For instance, a specific constraint is strongest at the lower end of a range but a second constraint is strongest at the higher end. Under such circumstances, the system has more than one minimum or maximum. To the extent that the description works, the "forces" of the system then push toward the local minimum or maximum, whichever is closest.

From the viewpoint of an uninformed observer trying to cram a complex system into #1 or #2, the apparent failure to reach the absolute furthest (global) maximum could be mystifying. If it's caught in the grip of a local maximum, then the failure is more intelligible. The system "rejects" small changes that result in an immediate "worse" outcome regardless of whether or not it's on the path to an ultimately "better" outcome. In short, a wildly intricate system occasionally gets stuck in a pothole of inferiority. And for that system, that state is as natural as anything else.

Hence knowledge of local minima and maxima provides greater nuance to human interpretation. Reasoning about the national economy is a ripe area. The temptation is to reduce discussion into the relative merits of a #1 system, in which the economy is like a tightly-directed machine operated by government, compared to a #2 system, in which the economy is like a spontaneous clump of microscopic participants. This is a discussion about nonexistent options. The economy isn't solely a dangerous beast that needs strict supervision. It isn't solely a genie that showers gifts on anyone who sets it free. It's beyond these metaphors altogether.

An economy that has run aground on local minima or maxima can't be adjusted successfully by treating it as a #1 or a #2 system. "Freeing" it won't erase the slope toward the local minimum. "Ordering" it to stop misbehaving also won't. The economy doesn't always accomplish every desired purpose. On the other hand, government can't completely override the economic transactions of the entire populace (nor should it try). What government can do, potentially, is help nudge the economy out of a local minimum by bending the system. Of course, the attendant risk is that excessive bending by the government might set up a new local minimum in the economic system...

Friday, September 02, 2011

git-svn on Windows and Robocopy

So...git clones of git-svn repositories aren't recommended. (Neither are fetches between git-svn clones. All collaboration should happen through the Subversion server.) Clones don't include the metadata that links the git and Subversion histories. However, unless commits in local branches are backed up elsewhere, work could be lost when catastrophe strikes the lone copy of those commits.

As decentralized version control, git's information is self-contained in the ".git" subdirectory of the repository. Thus creating a backup is straightforward: duplicate that subdirectory. But the common copy commands are wasteful and unintelligent. Must one overwrite everything every time? What about data that's no longer necessary and should be deleted in the backup location as well?

Fortunately, there has been a ready command available in Windows: Robocopy. In this case, it's executed with the /MIR switch. Between git's filesystem-based design (i.e. no database dependencies or complicated binary coding) and Robocopy's smarts, incremental changes to the git-svn repository usually result in minimal work performed during subsequent calls to Robocopy.

A developer could also mirror the entire contents of the working directory, but the pattern of making many small local commits in the course of a workday means that at any time there are few uncommitted changes in the working directory. At the time that the commits will be pushed to Subversion, an interactive rebase beforehand ensures that the "noise" of the numerous commits won't be preserved in the permanent version control record of the project.

Thursday, September 01, 2011

rule of 8...

...If someone discusses the budget of the federal government of the U.S.A. with fewer than 8 independent numbers or factors or categories, then chances are that the discussion is oversimplified. Figures have been skipped to present a "condensed" viewpoint that's more consistent with a partisan narrative. The 8 or more numbers are "independent" in a statistical (information-theoretical) sense: none of the numbers is a complete mathematical function of the others, such as the third always equaling the difference of the first and second.