Friday, September 28, 2007

human brain as VM

By "human brain as VM" I don't mean running bytecode on wetware. (That would give a sinister twist to the phrase "write once, run anywhere", eh?.) I mean the mental process of understanding the operation of lines of source code, sometimes called tracing within and without debuggers. I have assumed that other developers use the same strategy I do when confronted with unfamiliar code: mentally simulating what and how the computer will execute. That is, building up a mental model of a language and the various scopes of a particular program, then applying that mental model to the actual instructions to produce a projected result.

But just a few days ago I attended a training presentation in which the source code on the slide had no chance at all of accomplishing what the presenter said it did. I'm pretty sure the presenter wasn't stupid or purposefully careless. My conclusion was that he had composed the code by copying it incompletely from elsewhere or incorrectly translating an abstract algorithm into its concrete counterpart. Either way, he had most certainly not parsed and traced the program like a "computer" (in quotes because every medium or high level language technically is an abstraction from the actual computer's execution of its instruction set and memory).

How common is it for developers to read, write, and otherwise think about code without putting it through the mental lexer, the mental parser, or even the mental Turing machine as a last resort? I ask because I'm genuinely curious. I don't feel that I understand code unless I can visualize how the individual character elements fit together (syntax) and how the data flows through the program as it runs (semantics); is this not the case for others? Is this another instance of my rapport with languages and grammar and logic? Am I wrong to read through not only the prose of a training book, but also the code examples, and then not continue until I can mentally process that code in my mental VM without tossing compile-time syntax exceptions?

I know some (ahem, unnamed) people whose primary mode of development is tracking down examples, copying them, and modifying them as little as possible to get them to work. Example code is an excellent aid to understanding as well as an excellent shortcut in writing one's first program; I'm not disputing that. Patterns and techniques and algorithms should be publicized and implemented as needed; I'm not disputing that either. And I understand that the scary corners and edges of a programming language (static initializers in Java?) are probably better left alone until really needed, to avoid excessive hassles and complications.

However, if someone wants to save development time and effort, especially over the long run (small programs tend to morph into HUGE programs), isn't it better to take the time to learn and understand a language or API and instead reuse code through a code library mechanism, rather than copying code like a blob of text and forfeiting one's responsibility to, well, know what's going on? Don't forget that by their very nature copy-and-paste code examples fall out of date obscenely fast. Often, it seems like people write code one way because they soaked it up one day and now they cling to it like a life preserver. Multiline string literals created by concatenating individual lines and "\n", in one of the languages with features for multiline string literals, never ceases to amaze me. As a last plea, keep in mind that as good APIs evolve, common use cases may become "embedded" into the API. Congratulations, your obsolete copy-and-paste code example written against version OldAndBusted is both inferior and harder to maintain than the one or two line API call in version NewHotness. Try to keep up.

Thursday, September 27, 2007

Schedules Direct is my hero

Please observe a moment of gratitude for the Schedules Direct service, which has not only kept and met its goals of $20/yr cost and an alternative payment method besides PayPal, but has also been extending the subscription period of us early adopters accordingly.

...thank you.

Friday, September 21, 2007

good-bye Alex

I was emotionally affected by this more than I would have expected: Alex the intelligent parrot has died. Similarly trained apes and dolphins can do the same mental tasks, if not more, but it was always amusing to experience a real "talking animal", which is a quite common occurrence in fiction. Debates about the true nature of how Alex's abilities operated strike me as hollow, but then again I lost interest in scientific psychology years ago. One of the firmest conclusions about intelligence that have come out of AI is that "common-sense" reasoning is the end result of many data items and training sessions, so the boundary between "thinking by rote" and "higher-level cognitive functions" may be mushier than we humans care to admit.

Sunday, September 09, 2007

R5F27 release of knoppmyth

Torrent link. This is the one with MythTV 0.20.2 (Schedules Direct built-in support).

In other breaking news: impersonating a deity should have been made against Roger Federer's programming. Eh, there's always next year.

Friday, September 07, 2007

peeve no. 250 is unnecessary surrogate keys

The Wikipedia definition of surrogate key is here. In my database experience, it takes the form of an "autonumber" column, also referred to as "serial" or "identity". Whatever the database product or nomenclature, each new row has a guaranteed-unique, generated value in that column. The commonness of surrogate key columns verges on mundane.

As with the other peeves, my irritation has subjective and objective aspects. Objectively, a surrogate key has the downside of being fairly useless for queries for multiple rows--which also makes its index a fairly useless optimization. This is tempered by the syntactical ease of retrieving one known row, but that case is really more of a "lookup" than a "query", don't you think? (I would contrast the selection and projection of tuples, but why bother...)

Another objective danger with unnecessary surrogate keys, perhaps the worst, is the duplication of data which shouldn't be duplicated. By definition the surrogate key has no relationship with the rest of the columns in the row, so it isn't a proper primary key for the purpose of database normalization. If the cable company's database identifies me by an auto-generated number, and I sign up again as a new customer paying promotional rates after I move, the database won't complain at all. In practice, of course, the entry program should aid the user in flagging possible duplicates, but clearly that lies outside the database and the data model. Admittedly, judging whether "Joe Caruthers in Apt B3" is a duplicate of "Joseph Caruthers in Building B Apartment #3" is a decision best left up to an ape descendant rather than the database.

The objective tradeoffs of a surrogate key are worthy of consideration, but my subjective disdain for unnecessary surrogate keys goes beyond that. I feel violated on behalf on the data model. Its relational purity has been sullied. Lumping in an unrelated number (or value) to a row of real data columns feels out of place, like adding a complex "correction factor" to a theoretically-derived equation so it fits reality. But the unnecessary surrogate key has the opposite effect: it causes the table to have less resemblance to what it models. An unnecessary surrogate key leads someone to wonder if the table was the victim of a careless and/or thoughtless individual who went around haphazardly numbering entities to shut the database up about primary keys. Normalization, what's that?

I've seen enough to realize the convenience of a surrogate key, especially for semi-automated software like ORM mappers. It's also clear that sometimes a surrogate key is the only feasible way of uniquely identifying a set of entities, particularly when those entities are the "central hubs" of the data model and therefore the corresponding primary keys must act as the foreign keys for a large quantity of other tables. The technical downsides of a surrogate key can be mitigated by also including a real primary key for the table.

If the overall design doesn't make obvious the choice of each table's primary key, that's a clue the design should be reevaluated. (You did design the tables before creating them, right?) Don't forget that sometimes the primary key consists of every column. The table that serves as the "connecting" or "mediating" table in a many-to-many join is a good example, because it's essentially a table of valid combinations.