Rippling Brainwaves: Software development

Showing posts with label Software development. Show all posts

Monday, September 07, 2015

thank you for not smoking

The previous concept reapplied from software was the black box analysis technique. The technique metaphorically places something inside a black box, which signifies avoidance of direct scrutiny or even identification. The something's effects are examined instead, thereby circumventing the interference or the labor of knowing the something and its inner workings. The analysis proceeds through the factual details of various interactions between the something and its environment.

It's highly relevant to the goal of objective testing, because it avoids prejudices. The act of inspection is entangled in the inspector's slanted perspective, while black box tests compare clear-cut outcomes to uninfluenced expectations. If external outcomes don't satisfy neutral and sensible criteria then the something should be reevaluated, regardless of who/what it is and the characteristics it supposedly has within.

Beyond black boxes, the topic of testing software includes another broadly useful concept: smoke tests. These are rapid, shallow, preliminary, unmistakable checks that the software is minimally serviceable. The name comes from the analogy of activating electronic equipment and just seeing if it smokes. A smoke test of software runs the smallest tasks. Can it start? Can it locate the software that it teams up with? Can it load its configuration? Can it produce meaningful output at all?

No specialized expertise is necessary to notice that smoke tests are vital but also laughably inadequate. Since the software must pass much more rigorous tests, it's logical to question why smoke tests are worthwhile to perform more than once on the same software. However, the bare fact is that software seldom stays the same, especially in the middle of furious development. Thus the worth of smoke tests is more for quickly determining when a recent modification is gravely problematic. A malfunctioning smoke test implies the need to reconsider the recent modification and rectify it—probably very soon in order to prevent delays in the overall schedule.

The surprise is that smoke tests resemble a mental tactic that shows up in various informal philosophizing. Like software developers who screen their attempts with smoke tests and then promptly fix and retry in the event of failure, a follower of a belief may repeatedly rethink its specifics until it's acceptable according to the equivalent of a smoke test. In essence the follower has a prior commitment to a conclusion, which they purposely reshape so that it at least doesn't "smoke". This tactic greatly differs from carefully proposing a tentative claim after collecting well-founded corroboration. And it differs from the foundation of productive debate: the precondition that the debaters' arguments are like orderly chains from one step to the next, not like lumps of clay that continually transform to evade objections.

As might be expected, the smoke test tactic easily leads to persistent misunderstandings about aims. The unambitious aim of the tactic is a pruned belief that isn't flagrantly off-base, not a pristine belief that's most likely to be accurate. A few belief smoke tests are absurdity, contradiction with solidly established information, violation of common contemporary ethics, and so forth. (The changes might qualify as retcons.) Before they show the candor to concede that their aim is a treasured belief that isn't transparently wrong, rather than the novel belief that's plausibly right, they're mired in a loop of mending belief by trial and error.

They may justify the tactic by saying, "Of course I can't profess the most uncomplicated, unswerving variant of my belief. I know that variant can't be correct. It would be too [absurd, barbaric, intolerant, naive, infeasible, bizarre, self-contradictory]. I use my best understanding to strengthen the weak points that ring false. Doesn't everyone? Why's that a reason for criticism?"

This rationale is persuasive; to revise beliefs over time is no shortcoming. The telling difference is that everyone else isn't using the tactic on beliefs portrayed as complete, authoritative, correct, and self-supporting. It presents two issues in that case. First, why would the belief have been communicated in such a way that the recipients need to make fine-grained clarifications for the sake of succeeding at smoke tests—which are exceedingly basic, after all? Second, once someone has begun increasingly reworking the original belief to comply with their sense of reasonableness, when does the belief itself stop being a recognizable, beneficial contributor to the result? Is it not a bad sign when something requires numerous manual interventions, replacement of parts, and gentle handling, or else it swiftly proceeds to belch embarrassing smoke?

Sunday, August 02, 2015

black boxes blocking baseless bias

Considering the proportion of time filled by a full-time career, its thought patterns carve deep grooves. Hence the blog winds up with entries musing on the wider application of software patterns like, say, competing structures. In a software project, diverse structures of data and code could all be part of doable solutions. But the project allows only one solution. Not all of these structures have equal quality, so a competition is appropriate. Meanwhile in the philosophical domain and elsewhere, humans contrive diverse mental structures for the "project" of thinking and acting within their puzzling realities. And it shouldn't be verboten for these structures of uneven quality to fairly compete.

That's the toughest obstacle in practice: defining and applying legitimately equitable standards of comparison. Whenever evaluators have decided beforehand that the structures they endorse will be superior, then their tendency is to choose and distort the standards to assure it. The ones committed to candor readily admit this; even better, if they're confident then they welcome offers of separate reviews that will validate the credibility of their own.

Luckily, the ceaseless struggle to approach ideas with less partisanship has another pattern back in the technological domain. The common black box technique refers to analyzing something purely via the stuff entering and exiting it. Knowledge about the thing, and its contents, is excluded for whatever reason. Conceptually, the thing is hidden inside a black box with little holes for stuff to pass through. On a diagram, multiple arrows go to and from the box, but nothing is written in the box except its name. As a side note, a representation of a single, huge, unexamined thing containing miscellaneous parts, such as an external computer network, might have been drawn as a bumpy cloud to emphasize its vague "shape" and "size".

The black box analysis is simplified and undeniably easier to manage as a consequence. Sometimes, depending on the task, the thing's innards are mostly off-topic. To smoothly interact with the thing, the more crucial details are what agreed-upon stuff will come out (or occur) after agreed-upon stuff goes in. Without condensed black box abstractions, the modern industrial age of specialized, interchangeable technology would be infeasible. Everyone would need to know an excessive amount about the individually complex pieces merely to construct a functioning whole. This is an equally essential ingredient of software. With published protocols and data formats, software can handle other software as black box peers which accept and emit lucid messages. Broad classes of compliant software can profitably cooperate.

Overall, an extensive black box description is invaluable for something that's largely unknown—by design or by circumstance. In contrast, the value of a black box description for something that's largely known is less intuitive. It hinges on the recognition that too-close familiarity with something might build a deceptive or incomplete impression of its satisfactoriness. When the something is software, it's only logical that its industrious writer is unaware of its oversights, else they wouldn't have written the oversights. At writing time, they may have framed their solution too narrowly to enclose the project's range of subtleties. Later, their ongoing frame goes on to prevent them from imagining tests capable of exposing the cramped, inadequate boundaries.

Mistakes of oversight are rivaled by occasionally embarrassing mistakes of "transcription": the writer failed to faithfully encode their original intent. They wanted to read memory location Q but they wrote code that reads J. Once again, it's only logical that such mistakes wouldn't survive if the writer's own firsthand experience caught every gaffe they introduced. They may have been distracted. Depressingly often, disorganization gradually accumulates in the code segment. Or, in a less forgivable offense, it's confusingly expressed from the outset. As a result, although they're staring directly at a mistake, they're distracted by the onerous strain of deciphering and tracking the bigger picture.

Less specifically, the sizable value of black box analysis for a largely known something lies in cross-checking the fallible judgments of "insiders" about that something. Placing it in a black box counteracts the hypothetical shortcomings of the insiders' entanglement. It includes putting aside comprehensive information about something's unique identity and full set of characteristics, and putting aside other connections/relationships, and putting aside appeal/repulsiveness. It's the candid, untainted estimation of whether the something's observed "footprints" match levelheaded expectations in pertinent contexts. The writer's admirable pride of craftsmanship doesn't attest that the supposedly finished software unit operates acceptably in all probable cases.

This practice's basic features are visible throughout innumerable domains, though with varied titles. It chimes with the "blinding" of subjects in experiments and surveys of customer tastes. (In an unusually palpable manifestation of the metaphor, part of the blinding procedure might employ nondescript, opaque boxes.) Blinding forces them to assess the sample with the sole attribute they can sense. From their view, the sample's source is in a black box. A second example is the services of an editor. They can approve and/or modify sections of a draft document according to the unprepared reactions it elicits in them. Unlike the submitter, they aren't an "expert" at knowing what it's meant to convey. They don't feel the submitter's strong sentimental attachments. They have a greater chance of encountering the draft itself. Where the editor is concerned, the labor behind it doesn't affect their revisions. The draft came out of a black box.

A third example is in the same theme, albeit more cerebral. It's the strategy of, after a long session of work on a preliminary creation, reserving time away before revisiting it. In the heart of the session, the creation is summarizing a portion of the creator's stream of consciousness. Therefore the contemporaneous brain activity grants them the perfect ability to effortlessly compensate for the creation's ambiguities and awkward aspects. To them in that instant, the creation's "seamless" substance and beauty are impossible to miss. When they return, their brain's state isn't enmeshed with the creation. They take a fresh look at its pluses and minuses in isolation. This is akin to the advice of not transforming a late-night brainstorm into irreversible actions until pondering it next morning. Interestingly, the something in the black box is the past configuration of the brain currently reexperiencing that past configuration's product, i.e. the creation/brainstorm. The critical difference is that the product isn't rubber-stamped due to where it came from—whose brain it rippled out of. No, the caliber of the product discloses the worthiness of whatever produced it, in this instance a past brain configuration. (It might be uncomplimentary. "My brain was really mesmerized by that tangent, but this is unusable nonsense.")

Despite its encouragingly widespread and timeless scope, black box-style thinking is a supplemental tool with inherent limits. It's for temporarily redirecting attention to the external symptoms of something's presence. Its visual counterpart is a sketch of a silhouette. It doesn't capture something's essence. It's not an explanation; on its own, a lengthy historical listing doesn't reliably predict responses to novel situations.

The epitome of an area dominated by these caveats is human conduct. Without question, the brain's convoluted character precludes painless black box analysis for rigorously unraveling how it runs. It exhibits context-dependent overrides of overrides of overrides...or it might not. Trends discovered during good tempers may have little relation to bad tempers. Or mannerisms connected to one social group may have little relation to mannerisms connected to contrasting social groups. Or a stranger switches among several conscious (or unconscious) guises, aimed at selectively steering the verdicts of unacquainted onlookers. The stranger is in a black box to the onlookers. The guises are collections of faked signals chosen to misinform the onlookers' analysis of that black box.

Caveats notwithstanding, entire societies heavily regulate members through black boxes of human conduct. (As a popular song from the early nineties famously didn't proclaim, "Here we are now: in containers.") Members are efficiently pigeonholed by unsophisticated facts about their deeds. In the society, facts of that type serve as decisive announcements of the member's inner nature. So, members who wish to be seen a certain way are obliged to adhere to the linked mandates. No extra particulars about them are accounted for. For this purpose, they're in a black box. It appears callous at first glance, but it exemplifies the earlier statements about the value of shortcuts for working with something that's largely unknown. When societies reach massive scales, it's impossible for members to obtain penetrating awareness of every other member. Like before, black box understandings ease interactions with scarce information about either party, becauses the pair can foresee what will transpire between them.

Furthermore, black box analysis of human conduct shares the advantages stated earlier for inspecting something that's largely known. The effectiveness is lowered by the caveats of this area but not eliminated altogether. It's more than adequate for imposing sharp, sensible thresholds on other findings. "If I didn't know them as well as I do, and they acted the same as they have in the situations I know of, would there be a disparity in how I esteem them? If there is, do I have a well-founded excuse for it? At some point, my firmest convictions about who they are should be aligning to some degree with their acts...."

Tuesday, July 07, 2015

competing structures

Lately I've been describing examples of ideas that overlap between my software career and my philosophical positions. The foremost consequence is the thorough puncturing of information's abstract mystique. First, the traceable meaningfulness of information is rooted in the corresponding work performed by teams of computers and humans; conversely, the traceable meaning of the work is shown in the corresponding transformations of information that the work achieved. This principle underscores that information is tied to concrete efforts, and it doesn't arise out of nothing or exist independently. Second, when a computer performs information work, the humdrum process backing it is less like mystical transfiguration than like sending water through a dauntingly intricate maze of pipes, as countless synchronized valves rapidly toggle. This principle underscores that neither information nor its changes have nonphysical foundations.

The third example of overlap is the competition among structures to be used in software projects. Projects have more than one hypothetical solution. A solution contains particular structures to represent and store the targeted information, e.g. a short alphanumeric sequence for a license plate. Additionally, the solution has a structure for the code to manipulate the information, a structure which joins together separate actions for differing circumstances (an algorithm governing algorithms). Thus, depending on the analyst's choices, the total solution houses information in varying discrete sets of structures. Each set might be functional and intelligible. Nevertheless, the structures could have serious faults relative to one another: redundant, complicated, circuitous, simplistic, disorganized, bewildering, fragile. What's worse, frequently the problems aren't apparent until more time passes, at which point the structures need to be delicately replaced or reshaped. Not all of the prospective structures that are doable for the project are equally faultless and prudent. And this is reconfirmed once shortsighted structures proceed to collide with challenging realities. ("I wish these modifications had been anticipated before the structure of this code was chosen.")

Instructive parallels to the principle of competing structures aren't hard to find outside of software. In so many subtle, open-ended contexts, there isn't a uniquely correct conclusion strictly reachable through systematic steps of deductive logic. As a result, humans end up with widely divergent "mental structures" as they attempt to grasp their confusing experiences. While they don't need to turn those structures into effective software, they do need to apply these structures of interpretation to bring order to their thoughts and acts. If they're considered sane, a lot of their adopted structures probably have at least a little coherency and accuracy. As disparate as the structures might be, obviously each is good enough in the adopter's estimation. The differences might even be superficial on closer examination. After all, as much as possible the entire group of structures should be accommodating constraints that are universal: the crucial details asserted by reliable evidence and/or by other, prior. well-established structures.

Regardless, again like the technological structures in software projects, the potential for numerous candidates does not imply that all have identical quality as judged by every standard. An organizing structure can be possible without attaining a competitive level of plausibility. Although the normal complexities of existence might not dictate an obvious and definitive singular structure, intense critique casts doubt on some candidate structures more than others. For instance, belief structures seem more dubious after calling for repeated drastic revisions, i.e. retcons. So are structures that propose "backstage" causes which happen to be almost completely undetectable by impartial investigators. Structures that avoid claiming unbounded certainty merely earn a ribbon for sincere participation in the competition of realistic ideas, not instantly gain as much credibility as the leading structures that also avoid this glaring flaw.

The criteria for ranking require great care as well. Explanatory structures should compete based on thoughtful neutral guidelines, not on indulgence of the favoritism embedded in preconceptions and preferences. Brisk disregard of a structure's failure to withstand unbiased evaluation is an error-prone strategy. Note that like items placed indifferently on the pans of a balance scale, directly measuring one structure alongside a second shouldn't be construed as close-minded disrespect toward either—provided the method of comparison in fact fair and not like a tilted scale.

Generally speaking, the principle of competing structures thrives in the commonplace domains that are unsuitable for the two extreme alternatives. These are domains where there isn't one indisputable answer, but at the same time the multitude of answers aren't of uniform worth by any means. Of course, software projects are far from the only case. For an art commission, a dazzling breadth of works would meet the bare specifications...though some might consistently evoke uncomplimentary descriptions such as insipid, garish, disjointed, derivative, slapdash, repellent, etc. Out of all the works that qualified for the commission, who would then foolishly suggest that some couldn't be shoddier than the rest, or that comparative shoddiness doesn't matter?

Saturday, May 30, 2015

journey to the center of the laptop

The last time I described how ideas from my software career shaped my present thinking, the topic was the interdependency between the meanings of data and code. The effective meaning of data was rooted in the details of "information systems" behind it: purposeful sequences of computer code and human labor to methodically record it, construct it, augment it, alter it, mix it with more data, etc. But the same observation could be reversed: the effective meaning (and correctness) of the information system was no more than its demonstrated transformations of data.

This viewpoint appeared to apply in other domains as well. For a wide range of candidate concepts, probing the equivalent of the concept's supporting "information system" usefully sifted its detectable meaning. How did the concept originally arise? How could the concept's definitions, verifications, and interpretations be (or fail to be) repeated and rechecked? Prospective data was discarded if it didn't have satisfactory answers to these questions; should pompous concepts face lower standards?

However, not all the software ideas were at the scale of information systems. Some knowledge illuminated the running of a single laptop. For instance, where does the laptop's computation happen? Where's the site of its mysterious data alchemy? What's the core of its "thinking"—with the precondition that this loaded term is applied purely in the loose, informal, metaphorical sense? (Note that the following will rely on simplified technological generalizations too...) The natural course of investigation is from the outside in.

To start with, probably everyone who regularly uses a laptop would say that the thinking takes place inside the unit. The ports around its edges for connecting audio, video, networking, generic devices, etc. are optional. These connections are great for enabling additional options to transport information to and from the laptop, but they don't enable the laptop to think. The exceptions are the battery slot and/or the power jack, which are nonetheless only providers of the raw energy consumed by the laptop's thinking.
Similarly, it doesn't require technical training to presume that the laptop's screen, speakers, keyboard, touchpad, camera, etc. aren't where the laptop thinks. The screen may shut off to save power. The speakers may be muted. The keyboard and touchpad are replaceable methods to detect and report the user's motions. Although these accessible inputs and outputs are vital to the user's experience of the laptop, their functions are like translation rather than thinking. Either the user's actions are transported to the laptop's innards as streams of impulses, or the final outcomes of the laptop's thinking are transported back out to the user's senses.
Consequently, the interior is a more promising space to look. Encased in the walls of the laptop, under the keyboard, behind the speakers, is a meticulously assembled collection of incredibly flat and thin parts. Some common kinds of parts are temporary memory (RAM), permanent storage (internal drives), disc drives (CD,DVD,Blu-Ray), wireless networking (WiFi). By design this group receives, holds, and sends information. Information is transported but not thought about. So the thinking must occur in the component that's on the opposite side of this group's diverse attachments: the main board or motherboard.
To accommodate and manage the previously mentioned external ports and internal parts, the motherboard is loaded with hierarchical circuitry. It's like a mass of interconnected highways or conveyor belts. Signals travel in from the port or part, reach a hub, proceed to a later hub, and so forth. As a speedy rest stop for long-running work in progress, the temporary memory is a frequent start or end. The intricacy of contemporary device links ensure that motherboards are both busy and sophisticated, yet once more the overall task is unglamorous transportation. There's a further clue for continuing the search for thinking, though. For these transportation requests to be orderly and appropriate, the requests' source has to be the laptop's thinking. That source is the central processing unit (CPU).
Analysis of the CPU risks a rapid slide into complexity and the specifics of individual models. At an abstract level, the CPU is divided into separate sections with designated roles. One is loading individual instructions for execution. Another is breaking down those instructions into elemental activities of actual CPU sections. A few out of many categories of these numerous elemental activities are rudimentary mathematical operations, comparisons, copying sets of bits (binary digits, either zeros or ones) among distinct areas in the CPU's working memory, rewriting which instruction is next, and dispatching sets of bits in and out of the CPU. In any case, the sections' productive cooperation consists of transporting bits from section to section at the proper times. Again setting aside mere transporting, the remaining hideout for the laptop's thinking is somewhere inside those specialized CPU sections completing the assigned elemental activities.
Also considered at an abstract level, these CPU sections in turn are built from myriad tiny "gates": electronics organized to produce differing results depending on differing combinations of electricity flowing in. For example, an "AND" gate earns its name through emitting an "on" electric current when the gate's first entry point AND the second have "on" currents. Odd as it may sound, various gates ingeniously laid out, end to end and side by side, can perfectly perform the elemental activities of CPU sections. All that's demanded is that the information has consistent binary (bit) representations, which map directly onto the gates' notions of off and on. The elemental activities are performed on the information as the matching electric currents are transported through the gates. And since thinking is vastly more intriguing than dull transportation of information in any form, the hunt through the laptop needs to advance from gates to...um...er...uh...

This expedition was predictably doomed from the beginning. Peering deeper doesn't uncover a sharp break between "thinking" and conducting bits in complicated intersecting routes. No, the impression of thought is generated via algorithms, which are engineered arrangements of such routes. The spectacular whole isn't discredited by its unremarkable pieces. Valuable qualities can "emerge" from a cluster of pieces that don't have the quality in isolation. In fact, emergent qualities are ubiquitous, unmagical, and important. Singular carbon atoms don't reproduce, but carbon-based life does.

Ultimately, greater comprehension forces the recognition that the laptop's version of thinking is an emergent quality. Information processing isn't the accomplishment of a miraculous segment of it; it's more like the total collaborative effect of its abundant unremarkable segments. An outsider might scoff that "adding enough stupid things together yields something smart", but an insider grasps that the way those stupid things are "added" together makes a huge difference.

Readers can likely guess the conclusion: this understanding prepares someone to contemplate that all versions of thinking could be emergent qualities. Just as the paths in the laptop were the essence of its information processing, what if the paths in creatures' brains were the essence of their information processing? Laptops don't have a particular segment that supplies the "spark" of intelligence, so what if creatures' brains don't either? Admittedly, it's possible to escape by objecting that creatures' brains are, in some unspecified manner, fundamentally unlike everything else made of matter, but that exception seems suspiciously self-serving for a creature to propose...

Saturday, May 02, 2015

data : code :: concept : verification

I've sometimes mused about whether my eventual embrace of a Pragmatism-esque philosophy was inevitable. The ever-present danger in musings like this is ordinary hindsight bias: concealing the actual complexity after the fact with simple, tempting connections between present and past. I can't plausibly propose that the same connections would impart equal force on everyone else. In general, I can't rashly declare that everyone who shares one set of similarities with me is obligated to share other sets of similarities. Hastily viewing everyone else through the tiny lens of myself is egocentrism, not well-founded extrapolation.

For example, I admit I can't claim that my career in software development played an instrumental role in the switch. I know too many competent colleagues whose beliefs clash with mine. At the same time, a far different past career hasn't stopped individuals in the Clergy Project from eventually reaching congenial beliefs. Nevertheless, I can try to explain how some aspects of my specific career acted as clues that prepared and nudged me. My accustomed thought patterns within the vocational context seeped into my thought patterns within other contexts.

During education and on the job, I encountered the inseparable ties between data and code. Most obviously, the final data was the purpose of running the code (in games the final data was for immediately synthesizing a gameplay experience). Almost as obvious, the code couldn't run without the data flowing into it. Superficially, in a single ideal program, code and data were easily distinguishable collaborators taking turns being perfect. Perhaps a data set went in, and a digest of statistical measurements came out, and the unseen code might have ran in a machine on the other side of the internet.

At a more detailed level of comprehension, and in messy and/or faulty projects cobbled together from several prior projects, that rosy view became less sensible. When final data was independently shown to be inaccurate, the initial cause was sometimes difficult to deduce. Along the bumpy journey to the rejected result, data flowed in and out of multiple avenues of code. Fortunately the result retained meaningfulness about the interwoven path of data and code that led to it, regardless of its regrettable lack of meaningfulness in regard to its intended purpose. It authentically represented a problem with that path. Thus its externally checked mistakenness didn't in the least reduce its value for pinpointing and resolving that path's problems.

That wasn't all. The reasoning applied to flawless final data as well, which achieved two kinds of meaningfulness. Its success gave it metaphorical meaningfulness in regard to satisfying the intended purpose. But it too had the same kind of meaningfulness as flawed final data: literal meaningfulness about the path that led to it. It was still the engineered aftereffect of a busy model built out of moving components of data and code—a model ultimately made of highly organized currents of electricity. It was a symbolic record of that model's craftsmanship. Its accurate metaphorical meaning didn't erase its concrete roots.

The next stage of broadening the understanding of models was to incorporate humans as components—exceedingly sophisticated and self-guiding components. They often introduced the starting data or reviewed the ultimate computations. On top of that, they were naturally able to handle the chaotic decisions and exceptions that would require a lot more effort to perform with brittle code. Of course the downside was that their improvisations could derail the data. Occasionally, the core of an error was a human operator's unnoticed carelessness filling in a pivotal element two steps ago. Or a human's assumptions for interpreting the data were inconsistent with the assumptions used to design the code they were operating.

In this sense, humans and code had analogous roles in the model. Each were involved in carrying out cooperative series of orderly procedures on source data and leaving discernible traces in the final data. The quality of the final data could be no better than the quality of the procedures (and the source data). A model this huge was more apt to have labels such as "business process" or "information system", abbreviated IS. Cumulatively, the procedures of the complete IS acted as elaborations, conversions, analyses, summations, etc. of the source data. Not only was the final data meaningful for inferring the procedures behind it, but the procedures in turn produced greater meaningfulness for the source data. Meanwhile, they were futilely empty, motionless, and untested without the presence of data.

Summing up, data and code/procedures were mutually meaningful throughout software development. As mystifying as computers appeared to the uninitiated, data didn't really materialize from nothing. Truth be told, if it ever did so, it would arouse well-justified suspicion about its degree of accuracy. "Where was this figure drawn from?" "Who knows, it was found lying on the doorstep one morning." Long and fruitful exposure to this generalization invited speculation of its limits. What if strict semantic linking between data and procedures weren't confined to the domain of IS concepts?

A possible counterpoint was repeating that these systems were useful but also deliberately limited and refined models of complex realities. Other domains of concepts were too dissimilar. Then...what were those unbridgeable differences, exactly? What were the majority of beneficial concepts, other than useful but also deliberately limited and refined models? What were the majority of the thoughts and actions to verify a concept, other than procedures to detect the characteristic signs of the alleged concept? What were the majority of lines of argument, other than abstract procedures ready to be reran? What were the majority of secondary cross-checks, other than alternative procedures for obtaining equivalent data? What were the majority of serious criticisms to a concept, other than criticisms of the procedures justifying it? What were the majority of definitions, other than procedures to position and orient a concept among other known concepts?

For all that, it wasn't that rare for these other domains to contain some lofty concepts that were said to be beyond question. These were the kind whose untouchable accuracy was said to spring from a source apart from every last form of human thought and activity. Translated into the IS perspective, these were demanding treatment like "constants" or "invariants": small, circular truisms in the style of "September is month 9" and "Clients have one bill per time period". In practice, some constants might need to change from time to time, but those changes weren't generated via the IS. These reliable factors/rules/regularities furnished a self-consistent base for predictable IS behavior.

Ergo, worthwhile constants never received and continually contributed. They were unaffected by data and procedures yet were extensively influential anyway. They probably had frequent, notable consequences elsewhere in the IS. Taken as a whole, those system consequences strongly hinted the constants at work—including tacit constants never recognized by the very makers of the system. Like following trails of breadcrumbs, with enough meticulous observation, the backward bond from the system consequences to the constants could be as certain as the backward bond from data to procedures.

In other words, on the minimal condition that the constants tangibly mattered to the data and procedures of the IS, they yielded accountable expectations for the outcomes and/or the running of the IS. The principle was more profound when it was reversed: total absence of accountable expectations suggested that the correlated constant itself was either absent or at most immaterial. It had no pertinence to the system. Designers wishing to conserve time and effort would be advised to ignore it altogether. It belonged in the routine category "out of system scope". By analogy, if a concept in a domain besides IS declined the usual methods to be reasonably verified, and distinctive effects of it weren't identifiable in the course of reasonably verifying anything else, then it corresponded to neither data nor constants. Its corresponding status was out of system scope; it didn't merit the cost of tracking or integrating it.

As already stated, the analogy wasn't undeniable nor unique. It didn't compel anyone with IS expertise to reapply it to miscellaneous domains, and expertise in numerous fields could lead to comparable analogies. There was a theoretical physical case for granting it wide relevance, though. If real things were made of matter (or closely interconnected to things made of matter), then real things could be sufficiently represented with sufficient quantities of the data describing that matter. If matter was sufficiently represented, including the matter around it, then the ensuing changes of the matter were describable with mathematical relationships and thereby calculable through the appropriate procedures. The domain of real things qualified as an IS...an immense IS of unmanageable depth which couldn't be fully modeled, much less duplicated, by a separate IS feasibly constructed by humans.

Thursday, June 28, 2012

Nick Burns the Client Object

In 2009 I compared a lack of data encapsulation to a nudist community. Not long ago another metaphor struck me, but this one applies to the interactions among objects within that anti-pattern: Nick Burns, Your Company's Computer Guy. If translated to the form of skit dialogue, it might be like the following.

Nick Burns, Client Object: (impatient) You're the Customer object, right? Tell me the average number of purchases per month, year-to-date.
Customer Object: (confused) Er...but...I don't have a method for doing that. I know the customer inside and out, but you're asking for statistics about the orders. I suppose I have the ability to fetch all customer purchases for a time range...
Client Object: (annoyed) Don't know much, do you? Give me all your data and MOVE! The Client Object .gets() all the information from the Customer Object, filters it, and finally calculates. THERE! Was that so hard?
Customer Object: (defensive) Well, if someone had given me the capability to do what you were asking, or if you had gone to the right Object in the first place...
Client Object: (dismissive) Don't worry about it. You're just one more do-nothing data container like all the objects in the hierarchy. I'm here to do your thinking for you. The Client Object starts to return to its caller, then hesitates and turns back to the Customer Object for one last bit of sarcasm. Oh, by the way, you're welcome!

Thursday, November 03, 2011

to be agile is to adapt

Not too long ago, I read Adapt by Tim Harford. It's an engrossing presentation of a profound idea: beyond a particular bound of complexity, logical or top-down analysis and planning is inferior to creative or bottom-up variations and feedback. Adaptation can be indispensable. Often, humans don't know enough for other approaches to really work. They oversimplify, refuse to abandon failing plans, and force the unique aspects of "fluid" situations into obsolete or inapplicable generalizations. They're too eager to disregard the possible impact of "local" conditions. Biological evolution is the prime example of adaptation, but Harford effectively explores adaptation, or non-adaptation, in economies, armies, companies, environmental regulations, research funding, and more. Although the case studies benefit from adept narration, some go on for longer than I prefer.

Software developers have their own example. Adapting is the quintessence of Agile project management¹. As explained in the book, adaptive solutions exploit 1. variation, 2. selection, and 3. survivability. Roughly speaking, variation is attempting differing answers, selection is evaluating and ranking the answers, and survivability is preventing wrong answers from inflicting fatal damage.

Agile projects have variation through refactoring and redesign while iterations proceed. Agile code is rewritten appropriately when the weaknesses of past implementations show up in real usage. Agile developers aren't "wedded" to their initial naive thoughts; they try and try again.

Agile projects have selection through frequent and raw user feedback. Unlike competing methodologies with excessive separation between developers and users, information flows freely. Directly expressed needs drive the direction of the software. The number of irrelevant or confusing features is reduced. Developers don't code whatever they wish or whatever they inaccurately guess about the users.

Agile projects have survivability through small and focused cycles. The software can't result in massive failure or waste because the cost and risk are broken up into manageable sections. Agile coaches repeat a refrain that resembles the book's statements: your analysis and design is probably at least a little bit wrong, so it's better to find out sooner and recover than to compound those inevitable flaws.

¹Of course, the priority of people over process is also quintessential.

Sunday, September 11, 2011

peeve no. 265 is users blaming the computer

No, user of the line-of-business program, the computer isn't the trouble-maker. It could be from time to time, if its parts are old or poorly-treated, but problems at that level tend to be much more noticeable than what you're describing. Generally, computers don't make occasional mistakes at random times. Despite what you may think, computers are dogged rather than smart. Computers do as instructed, and by "instructed" I mean nothing more than configuring the electricity to move through integrated circuits in a particular way. Computers can't reject or misunderstand instructions. No "inner presence" exists that could possibly do so.

I understand that placing blame on "the computer" can be a useful metaphor for our communication. But the distinction I'm drawing this time is substantive. To identify the precise cause of the issue that you've reported, a more complete picture is necessary. Your stated complaints about the computer's misdeeds really are complaints about something else. The reason to assign blame properly isn't to offer apologies or excuses. Figuring out the blame is the first step in correcting the issue and also in preventing similar issues.

Possibility one is a faulty discussion of the needed behavior for the program, way back before any computer played a role. Maybe the right set of people weren't consulted. Maybe the right people were involved, but they forgot to mention many important details. Maybe the analyst missed asking the relevant questions. Now, since the program was built with this blind spot, the issue that you reported is the eventual result.
Possibility two is a faulty translation of the needed behavior into ideas for the program. Maybe the analyst assumed too much instead of asking enough questions. Maybe the analyst underestimated the wide scope of one or more factors. Maybe the analyst was too reluctant to abandon an initial idea and overextended it. Maybe the analyst neglected to consider rare events that are not so rare.
Possibility three is faulty writing of the program itself. Maybe the coders overestimated their understanding of their tools and their work. Maybe the coders had comprehensive knowledgeable and didn't correctly or fully express what they intended. Maybe a fix had unfortunate side effects. Maybe the tests weren't adequate.
Possibility four is faulty data. Like blaming the computer, blaming the data is a symptom. Maybe something automated quit abruptly. Maybe manual entry was sloppy. Maybe the data is accurate and nevertheless unexpected. Maybe someone tried to force shortcuts. Maybe management is neither training nor enforcing quality control.
Possibility five is faulty usability, which faulty data might accompany. "Usable" programs ease information processing from the standpoint of the user. Maybe the program isn't clear about what the user can do next. Maybe unknown terminology is everywhere. Maybe needless repetition encourages boredom and mistakes. Maybe, in the worst cases, staff decide to replace or supplement the program with pen marks on papers or fragile spreadsheets containing baroque formulae. Downfalls in usability may disconnect excellent users from excellent programs.
Possibility six is the dreaded faulty organization, in which various units disagree or the decision-makers are ignorant. Maybe definitions are interpreted differently. Maybe the "innovators" are trying to push changes informally. Maybe the realm of each unit's authority are murky and negotiable at best. Maybe units are intentionally pulling in opposite directions. Regardless, the program probably will fail to reconcile the inherent contradictions across the organization.

Often, in the Big Picture of the blunder, the computer is the most innocent of all contributors.

Tuesday, August 23, 2011

git's index is more than a scratchpad for new commits

Someone relatively inexperienced with git could develop a mistaken impression about the index. After referring to the isolated commands on a quick-reference guide or on a "phrasebook" that shows git equivalents to other VCS commands, the learner might, with good reason, start to consider the index as a scratchpad for the next commit. The most common tasks are consistent with that concept.

However, this impression is limiting. More accurately viewed, the index is git's entire "current" view of the filesystem. Commits are just saved git views of the filesystem. Files that the user has added, removed, modified, renamed, etc. aren't included in git's view of the filesystem until the user says, with "git add" for example. With the exception of before the very first commit, the index is unlikely to ever be empty. It isn't truly a scratchpad, then. When checking out a commit, git changes its current view of the filesystem to match that commit; therefore it changes the index. Through checkouts, history can be used to populate git's view of the filesystem. Through adds, the actual filesystem can be used to populate git's view of the filesystem. Through commits, git's view of the filesystem can be stored for future reference as a descendant of the HEAD.

Without this understanding, usage of "git reset" is infamous for causing confusion. With it, the confusion is lessened. A reset command that changes the index, which happens in the default or with option --hard, is like a checkout in that it changes git's view to the passed commit. (Of course the reset also moves the branch ref and HEAD, i.e. the future parent of the next commit.) A reset command that doesn't change the index, which happens with option --soft, keeps git's "view" the same as if it remained at the old commit. A user who wanted to collapse all of the changes on a branch into a single commit could possibly checkout that branch, git reset --soft to the branch ancestor, and then commit. Depending on the desired effect, merge --squash or rebase --interactive might be more appropriate, though.

Post-Script: Since this is aimed at git newcomers, I should mention that before trying to be too fancy with resets, become close friends with "reflog" and "stash".

Post-Script The Second: Drat. The Pro Git blog addressed the same general topic, but based more directly around "reset". And with attractive pictures. And a great reference table at the bottom.

Tuesday, August 16, 2011

why it isn't done yet

Modifying decrepit C code ain't like dusting crops, boy! Without precise calculations we could fly right through a star or bounce too close to a supernova, and that'd end your trip real quick, wouldn't it.

Saturday, August 13, 2011

the dash hole principle

The cigarette lighter receptacle has an amusing name. In my automobile and many others that I've seen, the present form isn't actually usable for lighting cigarettes. Now it's a round hole in the dashboard with a cover that's labeled as a power outlet. Over time, cigarette lighter receptacles turned into dash holes. The users of an object emphasized the secondary applications of it until the object itself dropped its primary application. It changed meaning through inventive usage.

Software users can be expected to act the same. Software developers should accept that the users, acting like humans, will adapt by introducing their own concepts and assumptions to a "finished" project. As DDD advises, the key is their language. When they speak about the software, and therefore the underlying design or data model, their words throw attention onto their interpretation of the "problem domain". They might describe data groups/categories and store their evolving understanding with rigid entries, like attaching "special" semantics to product identifiers that start with "Q". They might take several hours to run a series of automated preexisting reports, stuff the conglomerated figures into a spreadsheet, and then generate a chart - additional work which could all be accomplished by a computer in a tenth of the time.

The point is, software in the hands (and brains) of users can easily become a dash hole: an original design that came to be viewed much differently in practice. Developers who don't meet the needs of users will be bypassed manually as time goes on. In some cases, this may be a good approach. Some changes in usage just don't justify substantial software modifications. However, to state the obvious, not everyone is a good software analyst. Ad hoc solutions, enforced not by the software but by scores of unwritten rules, are prone to causing data duplication due to no normalization, chaos due to employee turnover or mere human frailty, and tediousness due to not thinking thoroughly about the whole process.

Dash holes function as adequate power outlets. But imagine if irritating dash holes could've been replaced with something designed to serve that purpose.

Wednesday, March 23, 2011

cognitive load reduction

Debates about how to write better code (i.e. fewer bugs) revolve around increasing maintainability, but not too long ago I recognized a related and perhaps fundamental criterion: cognitive load reduction. The fewer disparate items that a developer must contemplate simultaneously, 1) the lower the chance that a mistake will slip in unnoticed, 2) the greater the amount of attention left for the details of the problem/domain rather than the twists and turns of the code. When code is confusing and demanding to comprehend, the cognitive load is greater, and therefore it's more difficult to write, trace, debug, and modify.

Awareness of impact on cognitive load should change the choices that someone makes. Sure, the first task is to produce code that meets the known requirements. Yet developers shouldn't then neglect the second task of refining the code until it's sensible. Code has two audiences, machine and human. This is a lens for perceiving the usual code debates.

Units of code organization with hard boundaries reduce cognitive load by freeing the reader from looking through many peripheral lines to trace execution.
Good names reduce cognitive load by freeing the reader from inferring what a variable is for.
An easier build process reduces cognitive load by freeing the builder from rehearsing and reciting a series of error-prone manual steps.
Version control that meets the team's needs reduces cognitive load by freeing the team from devising complicated workarounds.
Domain models that match the way that everyone thinks (according to common agreement) reduce cognitive load by freeing them from continual lossy translation of one another's statements.
Frameworks reduce cognitive load by freeing the reader from examining custom-made immature solutions to ordinary incidental problems, e.g. templating, MVC, protocols. On the other hand, obtrusive frameworks may increase cognitive load by overshadowing and complexifying the base code without marginal benefit.

Effective writing in natural human language doesn't place an excessive burden on the reader, who's trying to interpret the message. Similarly, effective writing in programming language doesn't place an excessive burden on the maintainer, who's trying to interpret the code's intent.

Sunday, March 20, 2011

calling a truce on sprocs

For a while, I've mostly been dismissive of database stored procedures or sprocs. The rationale is that databases are for storage ("Really, Capt. Obvious?"). By contrast, calculations, conditions, and data processing in general belong in a separate, dedicated tier; the clear benefit is a much more flexible, capable, reusable, and interoperable platform/language than the typical sproc. In this middle tier the intelligence resides in neatly divided objects that could potentially exploit different "persistence strategies" than the default database of choice. These objects presumably act as better models of the domain than collections of rows and columns. Application development happens on top of this middle tier rather than the database.

The opposite path is integration at the database level. Differing software all use the same "master" database. There may be a recurring import script that populates one or more tables with external data, entry interfaces that quite clearly manipulate rows and columns, canned reports whose queries become increasingly complicated. Knowledge of which tables to join or which column values to exclude spreads out through everything that performs a similar task. Analysts speak of the database as if it were the domain. Their first implementation question on new projects is "What tables do we need to add?"

Consequently, integration at the master database level can result in fragmentation and duplication. Enter sprocs. Essentially, a thoughtful agglomeration of limited and self-contained sprocs could take the place of a nonexistent middle/domain tier for some purposes. If everyone needs to run the same query all the time, at least putting it in a sproc will consolidate it. A complex calculation that everyone repeatedly makes could be computed in a single sproc. Ugly warts of the database model could have workarounds specified in sprocs.

Storage technology independence is lost with sprocs, but ongoing integration at the database level already makes that impossible. Sproc writing requires some learning but is offset by the considerable advantage of not having to rewrite the code in multiple clients. IDE support is less than ideal but a sproc shouldn't be too large anyway. Names and calls of sprocs are also rough but are likely to require less extra documentation than the alternative of laboriously touring table relations.

Sprocs: better than nothing.

Thursday, February 24, 2011

reactions to unit test failures

I suspect that deepest dedication to a suite of automated unit tests isn't proven by having a test that touches each public class. Nor by a test for each public method. Nor by a test coverage tool that reports a proportion as close to 100% as is reasonable for that codebase. Nor even by strict test-first methodology.

The deepest dedication is only proven later in the code's life, when changes happen. Code changes of significance should lead to test failure or stark test compilation (or interpreter-parse) failure. That moment is when dedication takes effect. What will be the programmer's reaction? If it's a tiny test that broke, then a rapid and focused adjustment to the test or the code is likely to be the happy outcome. (Sidebar: this argues for not skipping trivial tests that probably are redundant to big tests. A failing trivial test is easier to interpret than a failing big test!) If it's a complex test or a large subset of tests that failed, then the reaction might not be as, er, placid.

The worst reaction is to impulsively eliminate the failing test(s). Better but not by much is to turn the test(s) into comments or otherwise force a skip. A disabled/skipped test is always a temporary measure of compromise to reduce mental clutter during the thick of an intense task. It carries an implicit promise to enable the test at the next possible opportunity. Excessive distracting nagging is awful but permanently removing a safety net clearly falls into the "cure worse than disease" category.
Assuming the motivation for the code change was a real change in requirements rather than code refactoring and improvement, then direct elimination may be correct. Before doing so, remember that unit tests act like executable specifications for that unit, and ask yourself "Does this test correspond to a code specification that still applies to the changed code but in a different form?" When the answer is "yes", the test should be replaced with a corresponding test for the transformed specification. Consider previous tests that caught corner cases and boundary conditions. If an object previously contained a singular member, but due to changes in the problem domain it now contains a collection, then the test for handling a NullObject singular member might correspond to a replacement test for an empty member collection.
On the other hand, whenever the change's purpose is to improve the code while leaving intact all existing functions/interfaces of importance, elimination or fundamental rewrites aren't the right course. The test stays, regardless of its inconvenience in pointing out the shortcomings of the redesign. The right answer may be to rethink part of the code redesign or in a pinch to add on to it in some small way with some unfortunate adapter code until other modules finish migrating. Sometimes a big fat legitimate test failure is the endpoint and "smoking gun" of an evolutionary mistake of the code, and the professional reaction is to disregard personal/emotional attachment by cutting off or reshaping the naive changes. Never forget that to users the code is a semi-mysterious black box that fills specific needs. Sacrificing its essential features (rather than unused feature bloat) is too high a price for code that's more gorgeous to programmers. Granted, skillful negotiators can counter by pledging sophisticated future features that the redesigned code will support, in which case the pledges must turn out to be more than vaporware for the trick to ever work again.
With any luck, the ramifications are not so dire. A confusing unit test failure may not be a subtle lesson for the design; it may be nothing more than a lesson to write more (small) tests and/or test assertions. It seems counterintuitive to throw tests at failing tests, yet it makes a lot of sense given that tests are coded expectations. In effect, confront the failing test by asking "What did I expect?" immediately followed by "Why did I expect that?" Expectations build on simpler expectations. Attack the expectation in top-down step-wise analysis. The expected final outcome was 108, because the expected penultimate outcome was 23, because the expected count was 69, etc. Write tests for those other, lesser expectations. Now the tests narrow down the problem for you at the earliest point of error, as if being an automatic debugger with predefined breakpoints and watch-expressions.
It's a well-known recommendation to write an additional unit test for a bug discovered "in the wild". This test confirms that the bug is fixed and then reconfirms that the bug doesn't resurface, assuming frequent runs of the entire suite. After a few unsuccessful tries at passing this novel test, don't be too rigid in your thought habits to ponder the possibility that the untested test is buggy! In the prior items my encouragement was to not react by blaming the tests, since an unmodified test that passed before a code change and fails afterward logically indicates that what changed, i.e. the code, must be to blame. Philosophically man is the measure of all things and a unit's tests are the measure of the unit. Not so during the introduction of a test. At this special time, the test isn't a fixed ruler for measuring code errors. It's its own work in progress in a co-dependent relationship with the code it measures. Initially the code and the test are at danger of dragging the other down through bugs. A buggy test is a false premise that can lead to a false conclusion of fine code that appears to be buggy or worse buggy code that appears to be fine. Be careful to write tests as minimal, unassuming, and straightforward as is practical. Complex tests that check for complex behavior are acceptable (and hugely important!). Complex tests that are intended to check for simple behavior are less justifiable and trustworthy. Tests are miniature software projects. The more convoluted and intricate and lengthy a test becomes, the greater opportunity for bugs to sneak in and set up shop.

Thursday, February 17, 2011

information hiding applied to generic types

Over time, I have a growing preference for the term information hiding over encapsulation. Encapsulation communicates the idea of wrapping data in an object or protecting privileged access through managed methods. But design analysis shouldn't stop there. Information and/or knowledge can leak and spread through a design in more subtle ways. For instance, I previously wrote that the HTML page's DOM is an overlooked example of shared state in Javascript. When many code pieces, each meant to serve separate and independent goals, have accidental dependencies outside themselves, both code reuse and intelligibility suffer. A part can no longer be understood without understanding the whole.

The purpose of information hiding, interpreted in a more general sense than encapsulation, is to thwart the natural tendency for code to not only collaborate but conjoin into the dreaded big ball of mud. Information hiding preserves metaphorical distance between expert objects with well-defined responsibilities. Then whenever a question arises, there's one answer and that one answer comes from the expert. The opposite outcome is several answers, found by wandering through the system, and one can never be quite certain that all the right generalists have been duly consulted and cross-checked.

Generic types may be susceptible to violations of information hiding. For in order to concretely consume a class with a generic type parameter, the type for the parameter must be specified (of course). But the conscientious designer should frankly ask whether the type parameter information belongs in the consuming class. By including it, the consuming class must either be coupled to a type-specific instance of the generic class or itself become a generic class that requires and passes on the same type parameter. The first option makes the consuming class less reusable/flexible while the second option leads to greater implementation complexity and further propagation of the type parameter information.

The third option is to hide the generic type information from the consuming class altogether. Some possibilities:

Have the generic class implement a minimal non-generic interface. Then couple the consumer class to this interface. Creating/obtaining new instances of the generic class for the interface would happen through a separate factory/service locator/dependency injector.
If the generic class is part of an inheritance hierarchy, then move the non-generic portions into a superclass. It's permissible for a non-generic superclass to have generic subclasses. Now the consuming class can work with instances of the generic subclasses by typing them at the non-generic superclass level.
Assuming the generic class doesn't have generically typed state, consider making the class non-generic with some methods that are generic only when necessary. In such situations the compiler probably can infer the methods' type parameter based on which types the consuming class passes to the methods, so not even calls to the remaining generic methods need to have the complexity of "looking generic".

As with any usage of generic types, consider the trade-offs carefully. Generic types sacrifice some clarity and usability. In particular, when code isn't getting anything back out of a class, the information of how to fill in the generic type parameter tends to be irrelevant, distracting, and potentially too restrictive. Generic types in one part of a project shouldn't cause the entire project to take on needless generic types everywhere.

Thursday, January 13, 2011

quick tip for using gitextensions with git svn

I imagine some others have already noticed this, but git svn commands can be added as "Scripts" that are available from the history context menu. For instance, for "git svn rebase": Go to Settings > tab Scripts, click Add, enter a Name and be sure to click "Add to revision grid context menu", enter "C:\Program Files\Git\bin\git.exe" for the Command, "svn rebase" for the Arguments, then click Save. Now, with a right-click on the history graph in gitextensions, the new command should show up down at the bottom with the entered Name. Choosing it will bring up the expected output window. Similar steps apply for "git svn dcommit --dry-run" and "git svn dcommit".

It's a small change, yes. And for the full glory of git you still must click on the little terminal icon on the gitextensions toolbar and use the command line. But repetitive tasks should be made as rapid and unobtrusive as possible to conserve the programmer's cognitive load (which is why running one's unit test suite should also be extremely easy and painless). Of course, even the tiniest enhancement to workflow adds up over many times thereafter. Just as a mostly-positive monthly cash flow is the key to long-term financial sustainability, a mostly-frictionless development flow allows programmers to expend their valuable time on stuff that matters, like design.

Tuesday, January 11, 2011

git is a VCS for the imperfect programmer

I just recently had a workday in which I realized how appropriate git can be for imperfect programmers. My department had released a project to the users for the first time, so I was working through the inevitable few tickets and/or change requests that ensue when software leaves a controlled development phase and collides with people.

I'd completed several commits on top of the release when I got a call about a bug related to a highly exceptional set of data. "No problem," I thought. "I can work it in with all these other commits and it will go out with them on the next scheduled full redeployment to the website." So I started working, but after designing a fix I discovered that it had so few systemic dependencies that I could easily push it out on its own without disruption, and enable the user who called to continue her work on the problematic data set. In another VCS this might have been unwieldy, but not with git. I stashed my work, created and checked out a branch at the last-released commit, popped the fix off the stash, and committed it.

Unfortunately it was only after that point that I noticed a possible weakness of the fix (and also the original code). Once again, with git this wasn't cause for alarm. I corrected the fix by amending the commit I'd just made. Finally, I had a file whose only difference from the last release was the fix. I deployed the file.

I should mention that an older edition of Subversion is the official VCS of my team, and we simply don't use branches or tags (it's not my decision). As a small team in a small organization with lots of informal communication, it's mostly sufficient for our needs, although I imagine that we'll need to become more sophisticated in the future. Thus, my fresh branch for the isolated bug-fix had to be a local git branch only. In order to ensure that the next deployment included that commit, I had to incorporate it into the Subversion trunk. With git, a rebase of my branch onto the HEAD of master was easy, and of course the actual merge of it into master was then a fast-forward. After a quick delete of the merged local branch and "git svn dcommit", everything matched up again.

A number of things about the procedure were imperfect. I'm an imperfect individual who made missteps. And I work in an imperfect setup with a decidedly imperfect team VCS. But it turns out that git fits these conditions just fine.

Saturday, December 18, 2010

bow to the gitextensions cow

Recently I tried out gitextensions. A rhapsodic blog post seems to be in order.

There's an installer that includes msysgit and kdiff3. This means I haven't needed to download anything else to get started. The installer asked, up-front, how to handle the *nix/Windows line-ending issue and what to use for my name and email address. The GUI contains an easy way to edit .gitignore entries and it comes with default entries that are relevant to almost all Visual Studio development. It suggests and directly supports integration with the PuTTY tools for SSH authentication. This means I haven't needed to find and edit configuration files or go online to research recommended entries. As someone who considers himself to be at least minimally competent, I'm not phobic of manual configuration or command line usage, but why shouldn't the easy and predictable modifications be even easier?

My intense appreciation continued as I started using it. All the typical functions and their typical options are available. (Long-time git users doubtless prefer to perform the same tasks by rapid rote typing; there's an icon to pop open a "git bash" at any time, which is good to keep in mind.) Creating a branch is just a matter of entering a name when prompted, with a checkbox if you want to also immediately check it out.

The view includes the annotated history graph, the current working directory, and the current branch. Clicking on the branch name brings up a drop-down list of other branches. Choose one, and you check it out. Clicking on a commit in the graph brings up information about it in the bottom part of the screen, such as full commit details and the diff and the file hierarchy (each directory expandable and each file right-button-clickable for file-level commands like individual history). Clicking one commit then CTRL-clicking a second brings up the diff below.

Remember how git newbs tend to have trouble navigating the movements of files between the index and the working directory, especially before git became more friendly and talky? In gitextensions, the commit window simply has separate panes with buttons to move added/modified/deleted files in-between. There's also a button for amending. After the commit, or any other moderately-complicated operations, the git output pops up in a window for review.

Of course, pull, push, merge, rebase, cherry-pick, branch deletion are present, too. All are fairly straightforward assuming the user can follow the on-screen instructions and isn't completely ignorant about git. gitextensions has a manual that contains abundant screen captures, yet I imagine it's more useful as a reference for figuring out where/how in the GUI to accomplish a specific task than as a tutorial. I was pleasantly surprised by the smoothness of my first series of gitextensions conflict resolutions. kdiff3 came up, I chose the chunks and saved, then I clicked a continue button. Despite my later realization that I could've accomplished my goal through a more streamlined procedure, the end result was nevertheless perfect in the sense that I didn't need to apply a "fix-it" commit afterward (the credit likely should be split among git and kdiff3 and gitextensions).

My praise keeps going. gitextensions offers fine interfaces for "gc" and "recover lost objects", although thus far I haven't strictly needed either in my short usage span. It adds right-click items to the Windows file explorer. It adds both a toolbar and a menu to Visual Studio. If it isn't obvious, my personal preference is to keep the gitextensions GUI open all the time, supplemented by git-bash. On occasion, when I'm otherwise manipulating a file in explorer, I might invoke file operations right from there.

The remaining question is: are gitextension upgrades frictionless? Sooner or later the cow will tire of wearing that Santa hat...

Postlude: Farewell, Mercurial

Uh, this is uncomfortable. I'm sure you've heard this before, but it's not you, it's me. The cause definitely isn't something awful you did. You're still a great VCS that could make other developers very, very happy. I'm just looking for something else. My horizons have broadened a bit since we first met, and we don't think as alike as we did then. There are other options and considerations that I need to take into account. If I stuck with you forever, I worry that I'd become regretful or resentful. Some day, as we both change over time, I may come back to visit. Until then, I genuinely wish you well.

Sunday, November 07, 2010

shared state is not hard to find in Javascript

...and it's called "DOM" (the webpage Document Object Model). State is shared between pieces of code whenever all the pieces may read and write to it, and that applies to the DOM. No declarations are necessary; neither reads nor writes require express permission or coordination. In this freewheeling land known as the "document", anything is up for grabs.

Yet why does it matter, given that dynamic changes to the DOM are a huge part of Javascript's appeal? It matters because any shared state, by its nature, can lead to problems for the unwary, especially as size and complexity grow.

If various Javascript functions touch the DOM, then those functions may cease working properly whenever the page changes structure. One change, no matter how trivial, has the potential to mess up code in several places at once. For instance, say that two tabs switch order...
Similarly, whenever there's an algorithmic adjustment, all the functions that affect relevant individual parts of the DOM must be spotted and redone. Rounding and displaying four decimal places, but only in inputs for numerical data of a particular category, means code changes for everywhere that those input values are set or read.
My impression from some blogs is that people are feeling skeptical about the actual prospect of code reuse in many circumstances, but it's still a worthy ideal. As always shared state is a hindrance to reusability simply because it isn't parameterized. A function that includes an instruction to remove a style class from the element of id "last_name" is pretty difficult to reuse elsewhere.
On the other hand, shared state opens up the possibility of no-hassle collaboration. Function A (assuming a better name!) can take on the responsibility of setting up the shared state in some way. Then function B can do some other task to the shared state, any time after A has recently run. But function C can run directly after either A or B. So function A must leave the shared state in acceptable configurations for either B or C, and B must also account for C. Of course, if there's a new special data value in the shared state that affects what C must do, one must be careful to modify both A and B to set it accordingly. Hence, although shared state makes it highly convenient to intertwine the operation of many pieces of code, the intertwining also greatly reduces readability! It's much easier to analyze separate pieces of code with well-defined connection points.
Furthermore, implicit shared-state dependencies don't combine well with asynchronous execution. Javascript doesn't have concurrent execution (i.e. multithreading), but it's certainly possible for separate pieces of code, such as callbacks for clicks and timers and network requests, to execute in an unpredictable order. So while there won't be deadlocks in the traditional sense, unwitting callbacks that fire in an unintended sequence could leave the DOM in a useless or false form after the dust clears.

Many techniques could apply to mitigation of the shared state known as DOM. For information storage, rely on variables residing in purposeful scopes rather than DOM elements and/or attributes. Treat the DOM as an end, instead of intermediate, data format. Isolate the code that handles the DOM from the code that processes data. DOM modifications that always happen together should be collected into a single function with an appropriate abstract/semantic name.

Not all of the shared state in an application consists of variables. Whether log file, database, or DOM, access logic should be carefully considered to avoid maintenance headaches.

Wednesday, July 14, 2010

persistence of private by the Nucleus pattern

The encapsulation of data by an object's methods is one of the foremost goals of effective OOP. Restricting exposure of the object's private information prevents other code from accessing it. The inaccessibility ensures that the other code can't depend upon or otherwise share responsibility for the private information. Each object has a single responsibility: a sovereign private realm of information and expertise.

However, this ideal conflicts with the reality of the need to give objects persistence because most programs require data storage in some form. And the required interaction with the storage mechanism clearly isn't the responsibility of the objects that happen to correspond to the data. Yet how can the objects responsible, often known as repositories or data mappers, mediate between external storage and other objects while obeying encapsulation? How can information be both private and persistent without the object itself assuming data storage responsibility?

The "Nucleus" design pattern, very similar to an Active Record, addresses this issue. According to the pattern, a persistent object, similar to a eukaryotic cell, contains a private inner object that acts as its "nucleus". The nucleus object's responsibilities are to hold and facilitate access to the persistent data of the object. Therefore its methods likely consist of nothing more than public "getters and setters" for the data properties (and possibly other methods that merely make the getters and setters more convenient), and one of its constructors has no parameters. It's a DTO or VO. It isn't normally present outside of its containing object since it has no meaningful behavior. Since the nucleus object is private, outside objects affect it only indirectly through the execution of the containing object's set of appropriate information-encapsulating methods. The containing object essentially uses the nucleus object as its own data storage mechanism. The nucleus is the "seed" of the object that contains no more and no less than all the data necessary to exactly replicate the object.

Naturally, this increase in complexity affects the factory object responsible for assembly. It must initialize the nucleus object, whether based on defaults in the case of a new entity, or an external query performed by the storage-handling object in the case of a continuing entity. Then it must pass the nucleus object to the containing object's constructor. Finally, it takes a pair of weak references to the containing object and nucleus object and "registers" them with the relevant stateful storage-handling object that's embedded in the execution context.

The object pair registration is important. Later, when any code requests the storage-handling object to transfer the state of the containing object to external storage, the storage-handling object can refer to the registration list to match the containing object up to the nucleus object and call the public property methods on the nucleus object to determine what data values to really transfer.

Pro:

The containing object doesn't contain public methods to get or set any private data of its responsibility.
The containing object has no responsibility for interactions with external storage. It only handles the nucleus object.
Since the nucleus object's responsibility is a bridge between external storage and the containing object, design compromises for the sake of the external storage implementation (e.g. a specific superclass?) are easier to accommodate without muddying the design and publicly-accessible "face" of the containing object.

Con:

The nucleus object is one additional object/class for each persistent original object/class that uses the pattern. It's closely tied to the containing object, its factory object, and its storage-handling object.
The original object must replace persistent data variable members with a private nucleus object member, and the containing object's methods must instead access persistent data values through the nucleus object's properties.
The containing object's constructors must have a nucleus object parameter.
The factory must construct the nucleus object, pass it to the containing object's constructor, and pass along weak references to the storage-handling object.
The storage-handling object must maintain one or more lists of pairs of weak references to containing objects and nucleus objects. It also must use these lists whenever any code requests a storage task.
The code in the storage-handling object must change to handle the nucleus object instead of the original object.