Rippling Brainwaves

Thursday, January 13, 2011

quick tip for using gitextensions with git svn

I imagine some others have already noticed this, but git svn commands can be added as "Scripts" that are available from the history context menu. For instance, for "git svn rebase": Go to Settings > tab Scripts, click Add, enter a Name and be sure to click "Add to revision grid context menu", enter "C:\Program Files\Git\bin\git.exe" for the Command, "svn rebase" for the Arguments, then click Save. Now, with a right-click on the history graph in gitextensions, the new command should show up down at the bottom with the entered Name. Choosing it will bring up the expected output window. Similar steps apply for "git svn dcommit --dry-run" and "git svn dcommit".

It's a small change, yes. And for the full glory of git you still must click on the little terminal icon on the gitextensions toolbar and use the command line. But repetitive tasks should be made as rapid and unobtrusive as possible to conserve the programmer's cognitive load (which is why running one's unit test suite should also be extremely easy and painless). Of course, even the tiniest enhancement to workflow adds up over many times thereafter. Just as a mostly-positive monthly cash flow is the key to long-term financial sustainability, a mostly-frictionless development flow allows programmers to expend their valuable time on stuff that matters, like design.

Tuesday, January 11, 2011

git is a VCS for the imperfect programmer

I just recently had a workday in which I realized how appropriate git can be for imperfect programmers. My department had released a project to the users for the first time, so I was working through the inevitable few tickets and/or change requests that ensue when software leaves a controlled development phase and collides with people.

I'd completed several commits on top of the release when I got a call about a bug related to a highly exceptional set of data. "No problem," I thought. "I can work it in with all these other commits and it will go out with them on the next scheduled full redeployment to the website." So I started working, but after designing a fix I discovered that it had so few systemic dependencies that I could easily push it out on its own without disruption, and enable the user who called to continue her work on the problematic data set. In another VCS this might have been unwieldy, but not with git. I stashed my work, created and checked out a branch at the last-released commit, popped the fix off the stash, and committed it.

Unfortunately it was only after that point that I noticed a possible weakness of the fix (and also the original code). Once again, with git this wasn't cause for alarm. I corrected the fix by amending the commit I'd just made. Finally, I had a file whose only difference from the last release was the fix. I deployed the file.

I should mention that an older edition of Subversion is the official VCS of my team, and we simply don't use branches or tags (it's not my decision). As a small team in a small organization with lots of informal communication, it's mostly sufficient for our needs, although I imagine that we'll need to become more sophisticated in the future. Thus, my fresh branch for the isolated bug-fix had to be a local git branch only. In order to ensure that the next deployment included that commit, I had to incorporate it into the Subversion trunk. With git, a rebase of my branch onto the HEAD of master was easy, and of course the actual merge of it into master was then a fast-forward. After a quick delete of the merged local branch and "git svn dcommit", everything matched up again.

A number of things about the procedure were imperfect. I'm an imperfect individual who made missteps. And I work in an imperfect setup with a decidedly imperfect team VCS. But it turns out that git fits these conditions just fine.

Friday, December 31, 2010

hidden conceptual connectors

The least controversial statement possible about the brain's functioning is that it's enigmatic on large scales. Despite knowledge of input and output in any given situation, the complex path in-between can't be interpolated. It's a fuzzy tangle of inherent unpredictability as it continually adapts with each new environmental shift in the flood of external information. Small wonder that some scientists have insisted instead on restricting analysis to the most predictable of tangible, measurable behaviors.

But a statement that contradicts this one is also uncontroversial: from a first-person perspective, the brain's work isn't quite as enigmatic. It yields ephemeral subjective/mental/semantic "sensations" under the catch-all name, the conceptual stream of consciousness. Examination of the stream leads to psychological speculation about its supposed causes, e.g. the dream proves that you have a death-wish. Extraction and conversion of the stream to the form of an ordered chain of logical reasoning is a vital step to publishing polished arguments for others to appreciate and check.

It seems to me that a persistent frailty of these post-hoc explanations of the first-person experience is omission of essential steps. Sometimes it's crystal-clear that something must be missing from the person's fragmentary or abbreviated account. Sometimes the person may not have realized that they skipped a step in the telling; when questioned they say it was too "obvious" to be noticed. Sometimes one or more steps are purposely excised to avert dull boredom of the audience. Hence, although the underlying physical motions in the brain that produced the output are undeniably inscrutable, the perceived conceptual motions in the first-person consciousness or "mind" are also filled with gaps! Pure reason, the foremost ally of philosophers, is reliant on hidden connectors between the manifest thoughts.

Inventive parts of my brain must be whirring away without my express order or consent. How else can I explain the noteworthy clue that my consciousness can play its own soundtrack in the absence of corresponding sound waves? The first guess is that the music filling the interstices of my attention is randomized firings, but the selection is too often apropos to be due to coincidence. The meaning of the lyrics aligns with my focus. More than a few times, I had an anticipatory song. I read toward the top of the page, heard a song in the "mind's ears", and further down I read a textual header whose content matched up more or less directly to the already-in-progress song (most obviously when the header's a pun on well-known song lyrics). As my eyes searched the page as I read, my brain subliminally picked up the text of the below header in passing and that snippet then seeded a highly meaningful background tune. I mention the "smart jukebox in the head" as a prosaic illustration of the currents which are active yet not consciously initiated.
The next indication is visual. Abstract topics under extended consideration by me may become entangled in a particular unrelated location from my memories. When I return to the topic, a view (always from the same approximate "orientation") of the associated location "shows up". But unlike the inner jukebox music, the choice of stage has no connection to the topic; it's "just there" as the main sequence of my thought is engrossed in the topic. My theory is that when I'm deeply sunk ("transported") into the topic, my usual contextual awareness of my actual location loses force and allows an arbitrary remembered location to emerge. As I ponder the topic, it intertwines with the secondary setting, and from then on the two are joined. Like the apt background music, this occurrence isn't perceivably helpful to the main process, but it suggests that more is happening than the stream of consciousness (would we call the rest of the activity "brooks"?).
By contrast, heuristics can be wonderful aids. A few good mental shortcuts reduce the time and effort to obtain an answer, as long as the error rate is tolerable. What's striking is the frequency at which heuristics develop without deliberation. When a person performs repetitious tasks, generalizations crystallize. If a computer program always asks for confirmation and "Yes" is always clicked in response, there's no need to read the words every time thereafter. Sooner or later users may find it so routine and mindless that they barely can recall the clicking, either. Long exposure and training build an intricate web of heuristic reactions. People who perform identical tasks at their jobs for years appear "dead" to onlookers but they're really more like robots. They do A for case X, B for case Y, C for case Z. Eventually, solely the most exceptional cases manage to evade the heuristics. "In all my years working here, I've never seen..." The ability to leap instantly to the correct course of action isn't as remarkable when the solver is hastily reapplying a hidden, practiced heuristic.
While sneaky street-smart heuristics receive less credit than forthright book-smart education, the latter depends on unrevealed links too. Verbal learning happens as teachers lay new definitions upon previous definitions. As they become better acquainted with the new ideas, the learners likely won't need to refer to the various mediators that first facilitated their initial understanding. After the information is embedded in the learners, the comparative "glue" employed by the teachers during the lessons is unnecessary. Still, the educated person must acknowledge that these clumsy intermediaries were formerly invaluable for achieving minimal comprehension. The teacher's formula "___ is like a ___" was a bridge that could've been forgotten afterward. Thinking of atoms or other particles as microscopic balls or points is deficient to the reality in several respects, but the image is close enough for an introduction.
A skeptic could cast doubt on the necessity of the stepping stones laid during learning. Couldn't the instructor plainly state the base facts rather than fumbling with imperfect comparisons? Perhaps, but ultimately it's unavoidable because all communication is the learning process in miniature. Each word shapes the interpretation of its successors. The importance of context shouldn't be underestimated. When someone proclaims a love for "bouncing", the information intended by the proclamation is unlearnt until the arrival of more details. If according to grammar the object of the bouncing is "pogo stick", then the listener's uncertainty decreases, provided he or she knows about pogo sticks. Similar illumination would happen were the object of the bouncing "rubber ball", "check", "unwelcome guests". The point is that the word "bounce" is isomorphic to multiple stuff, a word like "check" that constrains "bounce" could also be isomorphic to multiple stuff, and the final overall sentence isomorphism is an intersection or conjunction of the possible/candidate isomorphisms for the individual words. It's a fundamentally parallel process that harnesses the ambiguity of isomorphisms to permit efficient and creative messages; word choices need not be exact, and continual recombinations use limited word sets to cover an astronomical range of thoughts real and unreal.
The sheer power of language for logical reasoning, fused to the uncanny human capacity and motivation for inventing explanations, can give the misleading impression that the form of nonverbal reasoning is no more than unverbalized logic. However, countless counterexamples are everywhere. A familiar category is visual estimation and evaluation such as whether a floor lamp can fit inside a box. Skill at such questions is independent of skill at logical/mathematical questions. Moreover, translating a question into a different format, viz. visualizations, can greatly affect its perceived difficulty. Solutions can arise through subtle analogies. Many anecdotes show a startling similarity in the structure of major breakthroughs: the discoverers were confounded until they adjusted their underlying approach or otherwise became inspired by knowledge from seemingly unrelated domains. In the more mysterious accounts, they arrived at the answer partly by interpreting material from dreams or dream-like meditative states in which frothy free association ended up mentally attaching and attacking the problem in the forefront. Metaphors may be simultaneously ridiculous and indispensable. Asking "what if?" might lead to a fruitless destination. On the other hand, it might lead to the key insight. A nonsensical whole can act as a temporary frame between important contradictions before the thinker determines the plausible interconnection.
At another level of remove, seasoned experts of a specific mental domain describe ideas about the ideas and assumptions about the assumptions. This "feel" is largely what qualifies the expert as an expert. A novice would pursue tempting and irrelevant side details or surface illusions where an expert would employ the well-honed knack for striking at the most beneficial edge. They speak of a "nose" for the right path or a "sense" for the likeliest "shape" of the right conclusion. For them the domain approximates a familiar environment in which even abstract concepts are almost like manipulable objects. Upon prompting they try to describe the domain's "aesthetics", which probably involve symmetry and complementarity. Their patterns of analysis enable them to quickly decompose a challenge and make it appear easy by a lack of wrong turns. Once again the end result doesn't fully expose the hidden concepts that somehow participated in its birth.
Lastly, and most mystifying of all, are the opaque reports. "I don't know." No matter how they are pestered, they claim to be oblivious to the origin of their highly-confident outcome. In short they know but not how. Naturally, they could actually be wrong, but for a time at least they express certainty. "It just is." No work is remembered. No argument left memorable traces. It's as if verification is possible but not reverse-engineering. There's no clearer proof for the uninformed nature of our reckoning of our own reckonings.

Saturday, December 18, 2010

bow to the gitextensions cow

Recently I tried out gitextensions. A rhapsodic blog post seems to be in order.

There's an installer that includes msysgit and kdiff3. This means I haven't needed to download anything else to get started. The installer asked, up-front, how to handle the *nix/Windows line-ending issue and what to use for my name and email address. The GUI contains an easy way to edit .gitignore entries and it comes with default entries that are relevant to almost all Visual Studio development. It suggests and directly supports integration with the PuTTY tools for SSH authentication. This means I haven't needed to find and edit configuration files or go online to research recommended entries. As someone who considers himself to be at least minimally competent, I'm not phobic of manual configuration or command line usage, but why shouldn't the easy and predictable modifications be even easier?

My intense appreciation continued as I started using it. All the typical functions and their typical options are available. (Long-time git users doubtless prefer to perform the same tasks by rapid rote typing; there's an icon to pop open a "git bash" at any time, which is good to keep in mind.) Creating a branch is just a matter of entering a name when prompted, with a checkbox if you want to also immediately check it out.

The view includes the annotated history graph, the current working directory, and the current branch. Clicking on the branch name brings up a drop-down list of other branches. Choose one, and you check it out. Clicking on a commit in the graph brings up information about it in the bottom part of the screen, such as full commit details and the diff and the file hierarchy (each directory expandable and each file right-button-clickable for file-level commands like individual history). Clicking one commit then CTRL-clicking a second brings up the diff below.

Remember how git newbs tend to have trouble navigating the movements of files between the index and the working directory, especially before git became more friendly and talky? In gitextensions, the commit window simply has separate panes with buttons to move added/modified/deleted files in-between. There's also a button for amending. After the commit, or any other moderately-complicated operations, the git output pops up in a window for review.

Of course, pull, push, merge, rebase, cherry-pick, branch deletion are present, too. All are fairly straightforward assuming the user can follow the on-screen instructions and isn't completely ignorant about git. gitextensions has a manual that contains abundant screen captures, yet I imagine it's more useful as a reference for figuring out where/how in the GUI to accomplish a specific task than as a tutorial. I was pleasantly surprised by the smoothness of my first series of gitextensions conflict resolutions. kdiff3 came up, I chose the chunks and saved, then I clicked a continue button. Despite my later realization that I could've accomplished my goal through a more streamlined procedure, the end result was nevertheless perfect in the sense that I didn't need to apply a "fix-it" commit afterward (the credit likely should be split among git and kdiff3 and gitextensions).

My praise keeps going. gitextensions offers fine interfaces for "gc" and "recover lost objects", although thus far I haven't strictly needed either in my short usage span. It adds right-click items to the Windows file explorer. It adds both a toolbar and a menu to Visual Studio. If it isn't obvious, my personal preference is to keep the gitextensions GUI open all the time, supplemented by git-bash. On occasion, when I'm otherwise manipulating a file in explorer, I might invoke file operations right from there.

The remaining question is: are gitextension upgrades frictionless? Sooner or later the cow will tire of wearing that Santa hat...

Postlude: Farewell, Mercurial

Uh, this is uncomfortable. I'm sure you've heard this before, but it's not you, it's me. The cause definitely isn't something awful you did. You're still a great VCS that could make other developers very, very happy. I'm just looking for something else. My horizons have broadened a bit since we first met, and we don't think as alike as we did then. There are other options and considerations that I need to take into account. If I stuck with you forever, I worry that I'd become regretful or resentful. Some day, as we both change over time, I may come back to visit. Until then, I genuinely wish you well.

Sunday, December 05, 2010

impossible flawless imitation

The inherent limitations to the analysis of a Turing Machine (TM) include a consequence that deserves more attention. A TM cannot be imitated perfectly through a generalized algorithm. Stated differently, a TM's full "description" cannot be inferred from an arbitrary amount of its output ("tape"). No empirical method can be all-sufficient for determining the exact source TM. You can't deduce a TM by its "footprints".

The reasoning is similar to other TM arguments, e.g. the inability to compute when/if a TM will halt. Suppose we know that some TM "S" produces a specific sequence of finite bits, and thereby we contrive our own TM "I" that produces the same sequence. Well, what about the next bit after that sequence (or the next erasure, etc.)? "I", according to its present configuration, produces either a 1 or 0. But given that the output is only assured of matching up to the last bit, there's no theoretical reason why "S" couldn't be identical to "I" in every way except for the configuration for the next bit, in which case our otherwise flawless imitation is ruined. For example, even if "S" has steadily produced alternating bits, it still might produce "00" at position 30, or 35, or 1,462.

Moreover, the situation could be quite possibly worse in a multitude of creative ways. Perhaps "S" is really a complex specimen of Universal Turing Machine with two "source" TM that are much different (analogous to a "piece-wise" mathematical function). "S" executes source TM "A" and then increments a count. When the count exceeds a predetermined value, "S" branches to a configuration that executes the other source TM "B". One may elaborate on "S" with further counts, branching, and source TMs. The point is to reiterate that although we can invent an "I" to imitate "S", we can never conclude that "I" and "S" are exact matches, and in fact we can't figure the ultimate similarity at all as execution stretches to infinity. Failure of "I" is due to missing information about the "S" TM, but worse is the more subtle problem: there's no way to know how much information is missing at any time!

So...

Generally speaking, clean-room or black-box re-implementations of software can't be guaranteed to succeed in every detail. For instance, exceptional conditions or highly specific combinations of input could trigger diverging outcomes. The imitation of the previous software's bugs can be particularly troublesome.
Whenever we compute/emulate anything that requires TM-level processing, we can't be absolutely sure whether we ever achieve total fidelity. This doesn't imply that a conceptual TM is the most convenient theoretical model of a phenomenon (it usually isn't!), but merely that the chosen model can somehow run on a "computer". Or in more formal terms, the model outlines an explicit procedure that yields predictions of one or more modeled quantities. In some open-and-shut cases, the model's error is persistently trivial under all experiments and we therefore have no justification for doubting its effectiveness. Yet history is filled with times in which a serviceable first model (circular planetary orbits with epicycles) is later replaced by one with still greater nuance and realism (elliptical).
Third-person human brains (and behavior) undoubtedly fall under these conclusions. That is, any verbalized or intuitive model of a specific brain cannot be completely understood. Sympathy has its limits. While careful observation combined with genuine insight yields an impressive array of knowledge about "what makes his or her brain tick", especially among species evolved to be highly social, the application of the knowledge often proves how superficial it truly is. Biographers can detect what they call formative experiences, but tasking the biographer to reliably predict what a living subject shall do next will illustrate that the analysis works best retroactively. Of course, if someone theorizes that the human brain performs computations beyond the power of a TM, then the argument is correspondingly strengthened. Two such brains surely cannot be synchronized when two lowly TMs cannot. The proposed mechanism of brain-based super-computation, e.g. quantum effects of exotic pieces/states of matter, would increase not decrease the difficulty.
More surprisingly, first-person human brains (and behavior) are also prone. There's no "imitation" of oneself, but there's a self-concept. The very process of answering the self-evident question, "Why did I think or do that?", necessitates the mental model of self. The brain attempts to develop a "TM" whose workings imitate the actual and mysterious workings of itself. Unfortunately, it's sabotaged by bias toward a pleasing answer! Transparency and self-knowledge are notoriously plagued by mistakes. Countless fictional works have relied on the humor if not the tragedy of self-delusion.

Thursday, December 02, 2010

escaping infinity

I've previously mentioned a chain of reasoning similar to this: 1) everything that is real is material, i.e. subject to the usual scientific "laws"; 2) whenever a substance "means" another substance, that relationship is an isomorphism recognized/superimposed by a material brain; 3) there is no ultimate theoretical barrier to the possibility of inventing a sufficiently advanced computer/program that could process "meaningful content" (i.e. isomorphisms) as well as a brain.

Many varying objections apply. One category pertains to an insinuated reduction and comparison of exceedingly-complicated consciousness to the current generation of inflexible, literal-minded computers and programs. And although it's likelier than not that the computers and programs that would successfully imitate the brain would be laid-out much differently than ours, these objections still make some valid points: a computer with more or faster components is "merely" another computer and a program with rather chaotic and evolutionary qualities, broken into segments running in parallel, is "merely" another program.

For instance, assume (if you're an objector, purely for argument's sake) that in principle a brain could be adequately simulated by the appropriate computer and program named Q. Brains, at least after suitable training, can simulate Turing Machines, so Q can also. We know that a sufficiently clever/defective program can end up running for infinite time (nonterminating loops) regardless of how "smart" its host is. Despite its sophistication, Q remains susceptible to this threat. But if Q is susceptible, then by assumption the brain that it adequately simulates is susceptible. How absurd! Everyone knows that people don't suffer from infinite thought repetitions. Thus, the original assumption has led to a false conclusion and there must not be a Q.

My list of responses fit a standard akin to natural selection. The brain's, and by extension Q's, safeguards against infinity aren't perfect (that's impossible), but are good enough.

Distraction. Brains are exquisitely tuned to react to change. Whether an alarming signal arrives through the senses or a long-gestating answer to a dilemma bursts into flower, the typical course of thought is continuous transformation in unanticipated directions. In the face of the avalanche, perhaps the better question is how there can ever be a mental impression of a unitary and uninterrupted flow of logic. Healthy and normal brains display a moderate weakness, if not desire, for distraction. Meditation is difficult. For some, stopping their feet from tapping and their knees from bouncing is difficult!
Memoization. The importance of context shouldn't be underestimated. It's fueled by short-term memory, the "scratchpad", which contains a temporary record of recent brain work. Moreover, since retrieval strengthens a memory, any cycle in the brain work will tend to self-promote. Hence the brain is primed to store previous trips through the loop. The other ingredient is pattern-matching. Each trip, something in the end leads back directly to the start. It's not much of a leap for the brain to construct an isomorphism among these remembered trips, thereby detecting the similarity and extracting the common pieces. Finally, these pieces allow for a short-circuit or "unrolling" of the loop because the brain now knows that the start always leads eventually to the start once more. There's no more need to carry out the (infinite) loop; staying at the start has the same ultimate effect. The execution of the loop has been optimized out of existence or "memoized". Clearly, memoization works best for short or easily-generalized loops. Lengthy or subtle loops could evade notice perhaps indefinitely. Consider cases of asymptotic progress, in which someone is fooled into the belief that the destination is reachable because it grows closer.
Simulation. The power of simulation or imagination allows a brain to contend instead with a comparatively toothless and "imaginary" version of the infinite loop. At all times, the brain can maintain its metaphorical distance, its suspension of disbelief. Through simulation, the loop is closer to an object under manipulation than a dictator reciting irresistible commands. The loop can be stopped, resumed, replayed backward, etc. The brain halts momentarily after every operation to review and reconsider the result, so the danger of runaway computation is nonexistent. If the whole enterprise is carried out with tools such as writing implements or another suitable device (e.g. smart phone?), then the halt is accomplished by nothing more intricate than resting hands. In short, the vital difference in simulation is that the brain may treat the loop "code" as data. Strictly speaking, it never really switches between modes of perception and simulation. It perceives and simulates and perceives and simulates. Dynamic feedback is built-in.
Ruination. Undeniably, real instances of infinite loops don't run forever on devices. Assuming the loop accomplishes something cumulative, its effects are likely to start causing noticeable trouble, even to the point of a forced halt when memory or disk is full. Alternatively, the physical parts of the system will quit functioning eventually since that is what continuous wear and tear does. Earthy considerations also affect brains. Moreover, brains are notoriously prone to failure. Fatigue, hunger, boredom, illness, etc. will cause a loop-stuck brain to produce mistakes in processing the loop, and the mistakes could incorrectly terminate it (e.g. your ComSci classmates who played Halo all night fail to find the right outcome during a "paper trace" question on the final exam). Once again, the greater the complexity of the loop, the more this factor kicks in. As a system shaped by evolutionary forces, a brain is better at survival than flawless mental calculation. Robust error recovery and correction is a more realistic strategy.

It may appear silly to ponder why biological beings don't end up in infinite algorithmic loops. However, infinite behavioral loops surely aren't nearly as ridiculous. We usually call them habits.

Sunday, November 07, 2010

shared state is not hard to find in Javascript

...and it's called "DOM" (the webpage Document Object Model). State is shared between pieces of code whenever all the pieces may read and write to it, and that applies to the DOM. No declarations are necessary; neither reads nor writes require express permission or coordination. In this freewheeling land known as the "document", anything is up for grabs.

Yet why does it matter, given that dynamic changes to the DOM are a huge part of Javascript's appeal? It matters because any shared state, by its nature, can lead to problems for the unwary, especially as size and complexity grow.

If various Javascript functions touch the DOM, then those functions may cease working properly whenever the page changes structure. One change, no matter how trivial, has the potential to mess up code in several places at once. For instance, say that two tabs switch order...
Similarly, whenever there's an algorithmic adjustment, all the functions that affect relevant individual parts of the DOM must be spotted and redone. Rounding and displaying four decimal places, but only in inputs for numerical data of a particular category, means code changes for everywhere that those input values are set or read.
My impression from some blogs is that people are feeling skeptical about the actual prospect of code reuse in many circumstances, but it's still a worthy ideal. As always shared state is a hindrance to reusability simply because it isn't parameterized. A function that includes an instruction to remove a style class from the element of id "last_name" is pretty difficult to reuse elsewhere.
On the other hand, shared state opens up the possibility of no-hassle collaboration. Function A (assuming a better name!) can take on the responsibility of setting up the shared state in some way. Then function B can do some other task to the shared state, any time after A has recently run. But function C can run directly after either A or B. So function A must leave the shared state in acceptable configurations for either B or C, and B must also account for C. Of course, if there's a new special data value in the shared state that affects what C must do, one must be careful to modify both A and B to set it accordingly. Hence, although shared state makes it highly convenient to intertwine the operation of many pieces of code, the intertwining also greatly reduces readability! It's much easier to analyze separate pieces of code with well-defined connection points.
Furthermore, implicit shared-state dependencies don't combine well with asynchronous execution. Javascript doesn't have concurrent execution (i.e. multithreading), but it's certainly possible for separate pieces of code, such as callbacks for clicks and timers and network requests, to execute in an unpredictable order. So while there won't be deadlocks in the traditional sense, unwitting callbacks that fire in an unintended sequence could leave the DOM in a useless or false form after the dust clears.

Many techniques could apply to mitigation of the shared state known as DOM. For information storage, rely on variables residing in purposeful scopes rather than DOM elements and/or attributes. Treat the DOM as an end, instead of intermediate, data format. Isolate the code that handles the DOM from the code that processes data. DOM modifications that always happen together should be collected into a single function with an appropriate abstract/semantic name.

Not all of the shared state in an application consists of variables. Whether log file, database, or DOM, access logic should be carefully considered to avoid maintenance headaches.

Monday, November 01, 2010

a bad statistics joke

Since I recently posted something with a statistical lean...

I recently encountered the statement: "Why be normal? No one else is."

And all I could think in response was: "Why be within one standard deviation of the mean? Only 68% of the population is, assuming it to be normal."

Saturday, October 30, 2010

statistical inference and racism

Up-front disclaimer: None of the following should be construed as supporting racism. If anything, my point is that racism is as shaky intellectually as it is morally. It's simply wrong to unfairly treat an individual based on indirect inferences from their similarities to a particular race (or ethnicity, gender, religion, etc.). I also apologize up-front for offending statisticians.

It occurred to me, as it doubtless has to others, that racism has some connections to faulty statistical inferences. Of course, I don't mean that the typical racist is computing actual statistical regressions, but that part of their error consists of an "intuitive" misunderstanding that would appear unsound if it was quantified and written out. And naturally, the inability to estimate correct conclusions about probability and statistics is present in any untrained person, not just the racist one.

the instrumental variable

I've heard people aver that, since racial discrimination has continued to decrease, race should no longer be considered a determining factor in people's lives. While this is a happy thought, it naively misses a crucial aspect: (direct) discrimination is not the only way that racism acts.

Assume a statistical model or equation in which race and discrimination aren't present. It could be for any personal outcome like educational attainment, as long as race isn't included in the model's variables. Essentially, the mathematical equation expresses a complete lack of obvious relevance and causation between race and the outcome. Discrimination doesn't even enter the equation, as we posited.

Now run a thought experiment on our model. Suppose we apply the model to randomly selected people, i.e. we grab the info for each person and then calculate (not measure) the outcome with the model. Further suppose that we also record each person's race. Then we find, as in reality, that when we partition the calculated outcomes by race, the distribution is drastically and consistently different. How can this be? Race wasn't one of our variables, because without outright discrimination it had no mechanism of causation.

The answer is that, in this hypothetical model and thought experiment, race could still act as an instrumental variable. Roughly speaking, race is correlated with the outcome through the model variables. For instance, in a model of personal income, it's highly reasonable to have the parents' income as a variable. Although the model may leave out "race", race can nevertheless affect personal income (the outcome) as long as it affects the parents' income (the modeled variables). Thus race can be a cause of an outcome despite not being a "cause".

Moreover, race is likely to be an instrumental variable for an array of models, and all the models are likely to be related too. Bad (or missing) education tends to lead to low-paying employment or unemployment, which tends to lead to poverty, which tends to lead to crime, which tends to lead to low property values and taxes, which tends to lead to low school funding, which tends to lead to bad education. Regardless of whether there's a blatant racial barrier presently at work, the long-lasting "fingerprints" of racial differences may be active (e.g., systematic segregation sometime in the past that restricted access to the same set of opportunities).

the overeager Bayesian

Stereotypes are simplifications by nature. Sometimes a stereotype can be a helpful mental shortcut, but often it doesn't reflect the most accurate concept of a "typical" sample from a population. This is notoriously true of racial stereotypes that highlight the worst representatives.

For compared to a proper Bayesian analysis, "reasoning" by a stereotype is overeager to apply shaky conditional probabilities. Consider a very hypothetical population with only two races that are each 50% of the population. 10% of the entire population has an undesirable characteristic (left unnamed). Of the people in the entire population who have the undesirable characteristic, 75% are of race A. Whenever someone in this population is of race A, what is the probability of having the undesirable characteristic?

When it comes to a stereotype for the undesirable characteristic, chances are the stereotype is of race A due to prevalence within that subgroup. So the stereotype would tend to override the right Bayesian answer, 7.5%, with a much greater probability that overlooks the facts that 1) only 10% of the entire population has the undesirable characteristic at all and 2) therefore 85% of race A does not have it.

Friday, October 29, 2010

peeve no. 261 is Obi and programming language mentality

The Force can have a strong influence on the weak-minded. (--Obi-Wan)

Similarly, a programming language can have a strong influence on weak-minded programmers. If your programming language "changed your life", "made you a better coder", "expanded your mind", etc., then I suggest that you may be going about this programming thing the wrong way. If a lack of language support for OO results in you producing a tangled web of dependencies among procedural functions, or easy access to global/file scope results in you spraying program state everywhere, then the programming language is your crutch. It's the guide-rail for your brain.

I'm glad that you found a programming language that you enjoy, but consider me unimpressed by rather impractical claims about the ways that it alters your consciousness.