Sunday, May 15, 2011

a datum is never alone

The proposition that meaning is isomorphism has a partner: fundamentally, information is only informative in collaboration with other information. Thus, the importance of context shouldn't be underestimated. Examples abound. Pronouns need referents. Sentences need dictionaries, mental or physical, stated or inferred. Amounts need units of measurement. Hobbit heights need people nearby for visual comparison. No datum (singular of data) is useful in isolation. No datum is a "loner".

a bit is "this not that"

In a primal sense, communication is "meta-", with connections outside itself. Since direct telepathy neither is desirable nor is physically possible (the mapping from one brain to another is too complicated because of many differing neural connections), thoughts cannot be sent "raw". Every element of a message can only be symbolic. For the message's symbols to be understood, the sender and recipient must share common knowledge. They must already be aware of both the symbols actually in the message and the symbols' source set (i.e. vocabulary). The symbol cannot be independent of the participants' prior knowledge.

Furthermore, a comprehensible message is a series of selections from the symbols in the set. To choose one symbol is to reject other symbols. Part of a chosen symbol's meaning resides in its distinctions from the rest of the symbols in the set. It's vital that the communication of the symbol be unambiguous, to prevent confusing one symbol for another, and conventional, to avoid needing exotic media or equipment.

The most obvious tactic is to count the symbols in the set. The first symbol counted will be represented by communicating "1", the second "2", and so forth. In the resulting number sequence, it's always clear which number represents which symbol and the message itself doesn't require unusual communication capabilities. The set of symbols is in the form of a number line (like a ruler) with each symbol as a numbered point.

However, there's a better option. The set could be in the form of a "road" in which the symbols are "houses". To identify/reach a symbol is to travel down this road according to the directions, and since there's only one road the directions are no more than many answers to the sole repetitive question "Stop at the next house?" Instead of a unique number, each symbol's representation is a unique progression of yes/no answers.

This technique has considerable advantages. First, the directions for symbols have distinct stops. No spacing is needed between the adjacent symbols in a message. The last answer in a symbol's directions is always "yes", so the answer after it is part of the next symbol's directions. Second, since the "closer houses" have shorter directions, putting first the most probable symbols or "houses" makes it more likely that the whole message, i.e. a sample of symbols, will be shorter. Third, the media needs are even less demanding than sending (base 10) numbers. Now just "0" and "1" are enough. Two possibilities, repeated. Bits.

Does a bit have meaning? On its own, not much. But along with many companion bits, enough to meet any need.

a set is "what you get by following procedure"

Communicators can send piles of bits. Taken together, those bits in turn signify a specific chain of symbols. Those symbols in turn signify atomic members of a source set (or alphabet). Everything depends on something else, with one remaining exception at the end of the dangling thread. Doesn't it appear that the source set is now the true wellspring of meaning that needs nothing else?

Not remotely! Being a collection of symbols, the source set itself is symbolic. No matter how self-defining it seems, nevertheless it's about still other information. For example, a digital spin on the ever-popular philosophical discussion of color words.

Over the years, digital images have represented color through a range of strategies depending on the relative priority of conserving space. (This detail admittedly feels more and more quaint to the oblivious user who at the present time routinely dumps a multitude of "RAW" exposures from the camera into the computer's permanent storage.) One of the trade-offs is to reduce the number of distinct colors to employ in the image, called the palette. By this reduction fewer bits are necessary to identify a color; a smaller range requires smaller numbers or indexes for the colors. Especially for an untrained eye and images like a logo or icon, 256 colors often works well enough and may be overkill.

The digital image's color palette is certainly a close analogy for a symbol source set, and really a direct application of the concept. Assuming the raw image had a much wider array of colors, the palette is a purposely limited sample of that original array. It's an abstraction of it, an approximation, a pigeon-holing, a filtering. Moreover, for exceedingly colorful and detailed images that contain a lot of grades, the choice of palette becomes uncertain and debatable. In general, symbol sets aren't unmistakably derivative from the symbolized information. Symbolization starts with categorization, and categorization is an action.

For a set of symbols to be reproducible accurately, the steps to produce it should be spelled out with clarity and precision. Consequently the definition of a set is a procedure. It has the two sections previously listed, categorization and then symbol-assignment to the categories. Sometimes the procedure could be straightforward (by referencing another set), e.g. "integers greater than 10 and less than 30, each integer symbolized by the typical numerals". Sometimes the procedure could be complicated and even surprising in its results, e.g. the Mandelbrot set. Sometimes the procedure could be an exhaustive (finite) listing of all symbols with a per-symbol definition. At the extreme, a stream of symbols continuously produced through a "procedure" of pure randomness (not pseudo-random!) could be the basis for a perfectly mysterious and unbreakable representation of a message; what makes it usually impractical is that the stream has to be as long as the message and the recipient requires full copies of the message and the stream to be assured of correct decoding...

Indeed, the random set of symbols is an ideal illustration of a valid objection to the last few paragraphs: "It's fine that communication depends on bits, and bits depend on symbols, and symbols depend on a predefined set, but how can the set depend on a 'procedure'? How do you communicate a procedure? By your own reasoning, how can it be possible to communicate anything, let alone a procedure, before the set of communication symbols is known?"

Of course, in keeping with the theme "a datum is never alone", the predictable reply to that objection is "Communication of the set-defining procedure happens by reliance on still other symbols". Those other symbols are individual processing instructions. By assumption, the receiver already knows and can perform a set of elemental and easy processing actions, each of which has a fixed symbol. The procedure to construct the new set of symbols is one more message whose symbols happen to represent requests for action by the receiver.

For symbols for integers greater than 10 and less than 30, some of the steps are checking that an integer is greater-than and checking that an integer is less-than. For the Mandelbrot set, feeding the result of a mathematical function back into it. For the image color palette, storing the 256 colors one-by-one. In a computerized procedure, each action is literally minuscule: store information, load information, add two numbers, check if a number is less than zero. These actions and therefore symbols (i.e. processor instructions) seem quite pointless out of context. In the wake of a huge organized group of actions, a highly meaningful procedure takes shape and yields products.

a segment is "interest in raw experience"

Strictly speaking, a symbolic list of generalized actions isn't sufficient to communicate and obey an actual procedure. Like other information, actions aren't "loners". An action is only significant insofar as external information has effects on it and it has effects on external information. The exchange of effects may often be slight and subtle, producing a temporary result in preparation for succeeding actions. In any case, ultimately actions are changes to information: copies, creations, modifications, extensions, adaptations.

Such manipulations are the essence of the meaning of the generated symbol set.  From a surface perspective, symbolism is substitution, i.e. convenient shorthand names of potentially lengthy manipulations of information. "Let me introduce 'Martin', whose parents are ___, who lives in ___..."  Decoding the symbols is reversing the procedure. "I'm sure that you remember Martin." "The Martin whose parents are ____, who lives in ____?"

In practice, procedures for frequent symbols are rarely repeated in full. Instead, the excellent pattern-matching powers of the human brain notice rapidly that a symbol's procedure always terminates at the same external information. Once the pattern is seen, the brain devises a shortcut of the procedure by forming a pair out of the start and end, a direct line, a conditioned "reflex" of information. It happens with little effort. Eventually, a human can have trouble consciously recalling the procedure at all. Profound intertwining of a symbol and its external information leaves a familiar whole, with the illusion that the symbol isn't separate.

Symbols denote procedural bridges between information. The procedure probably manipulates information via previously known symbols, leading to a novel "macro" set of symbols whose meaning becomes a procedure of procedures of procedures. Presumably, no matter the heights of the overall edifice of labor that underpins all the symbols and the interconnections, there is irreducible information far down below, unmediated and unprocessed. Else the complex of symbols would be circular. The massive loop could still be intricate and interesting, perhaps also grammatically correct based on some standard, but to humans it wouldn't necessarily be any more informative than frimjun tollywobbits bersing shugbovs. Irreducible information that precedes and fuels symbols shall be named "segments".

Segments seem like the most raw and standalone information possible, but that impression too is mistaken. As suggested by the name, segments are selective portions of "raw experience". A human's raw experience is the sum of everything that affects the human, causing shifts in activity in nervous system cells, which propagate across axons and synapses. It's everything heard, everything seen, everything felt, everything smelt. A camera or microphone picks up some of the same physical phenomena, albeit through differing mechanisms.

Physical information of raw experience is the least subjective, the least egocentric. Well-known measurement and calculation limitations notwithstanding (e.g. intrinsic relativity and quantum uncertainty), information about the positions and movements of particles is the bottom line. Mythical complete information of a set of particles is the Axiom of that set; there's nothing further to speak of. Laying aside the theoretical and practical impossibilities, other messages about that set are derivatives of that Axiom. Hypothetically, "cold glass of water" is backed by untold quantities of "physic"-al information on the movements of molecules of dihydrogen monoxide. And an irrepressibly boring communicator who sends more than that many bits of (present-tense) information directly about just the water is being redundant.

Given the horde of physical information, arguably a brain's ingenuity is what it elects to discard. Focus is indispensable. Survival is closely tied to a creature's ability to attend and respond to information of interest. Brains transform raw experience into segments because a lot of information is quite irrelevant and monotonous. It's "background" that's worthy of "peripheral" monitoring alone. Segmentation is a primary component of perception.

Visual points, of similar color, moving in the same direction and speed at once, are a segment of raw experience that draws the attention of most creatures. So do segments of loud and sudden noises or tasty odors. Pains discourage and pleasures encourage. The common factor again is interest. Human brains are eminently trainable but inborn interests offer the first opinion and initiate the first constraints on the innocent tide of information of raw experience. Perceptual unity is useless to answer the foremost questions of a competitive, feasible organism. Segments simplify reactions that craft beneficial answers.

Thereafter, years of human social interaction and mimicry build on the "natural" segments. They learn verbal symbols for the most prominent segments: kin, food, body. Simultaneously they learn to learn and to seek feedback by trying out symbols. To start, the teaching proceeds by gestures and demonstrations for simple segments such as nouns, verbs, adjectives, prepositions in immediate shared perception. The actions that define this set of symbols are literal acts: the motions of the teachers enacted on segments that already have interest to the learner. Sooner or later, the repetitious well-established actions allow for teaching the symbols for segments that have weak to nonexistent natural interest; at that point, humans' untiring curiosity and desire to please are great motivational aids. The ability to identify with the speaker's perspective, to imagine what the speaker intends to say, is a vital skill.

After a "critical mass" of symbols and grammar, the actions for learning new sets of symbols become the previously-mentioned mental manipulations of information. Description, contrast, and metaphor are typical symbol-defining actions. Explosive growth of information ensues. Symbols trigger computations and then symbolize the outcomes. The cycle continues.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.