One sign of a highly technical theory's influence and profundity is the improbable attention it receives in seemingly unrelated if not inappropriate contexts. Information theory certainly qualifies. Unfortunately one of these peripheral contexts is abstract arguments against biology.
The argument based on information theory follows the template of a more popular argument based on the Second Law of the theory of thermodynamics: "Biology cannot increase in orderly complexity because the Second Law states that disorder increases over time." The variant using information theory is "Biology cannot increase in orderly complexity because information theory states that the result of random modifications of an orderly message is nothing more than an unintelligible message. These random modifications are the noise of the corresponding information channel, and the noise reduces its optimal information rate."
However, the intuitive appeal of either argument relies on ignorance of the extraordinary scale of biology's abundant populations and prolonged time periods. Regardless of an event's slim probability, a sufficiently massive number of trials actually implies that it's more likely that the event will happen. In merely 13 die rolls, the probability of never rolling 6 is less than 1 in 10. Perhaps many biological changes are destructive like the metaphors of thermodynamic disorder or noisy information, and the candidates that experience those changes are worse-off. Yet the multitude of candidate organisms throughout epochs can still yield the event of a comparatively rare constructive change, and thereafter the change spreads through reproduction of that singular beneficiary. (Although according to the, y'know, facts, occasionally the constructive events have indeed occurred too rarely to compensate for rapid changes in circumstances—mass extinctions.)
Furthermore, this application of information theory prompts more questions. If it were accurate to analyze biology as an information channel in which all organisms were messages, and all modifications to organisms were noise, then how would the first noise-free organisms originate? Presumably from "The Great Communicator"...but the usual name for theoretical message senders is "Alice". Assuming all the first organisms were perfectly communicated messages from Alice, then later organisms could only have become more noisy or less perfect. Isn't this implied distinction between "message" and "noise" so narrow that it verges on nonsensical? Given all the actual differences between the first organisms and organisms in the present, is it apt for all these differences to qualify as increases in noise, i.e. always relative degradation? Is it accurate to suppose that every grandparent were a more precise expression of Alice's original message than every grandchild? Are the implications of this proposition consistent with human interactions with reality? In other words, is it true in the pragmatic sense?
In contrast, real biology's ambiguity of "message" and "noise" is closer to something else in information theory. Extreme information ambiguity is a defining feature of the unbreakable encryption strategy of a one time pad. The "pad" itself contains a long encryption key that's strongly random: no portion of the key/pad can be calculated from any other portion. Then sequential portions of this key are used to encrypt sequential small portions of a message. In effect, the key from the pad acts like the worst possible kind of communication noise. The key's "noise" affects each small portion of the message and the "noise" is always unpredictably different for each. It's like sending numerous tiny encrypted "micro-messages" and using a separate independent key for each micro-message. This is secrecy by brute-force. Hence the strength of this strategy is also its weakness. The large quantity of random and therefore incompressible key information must itself be exchanged over a sufficiently secure and efficient channel. But if a superior channel meets these requirements then it should communicate the message instead! (The strategy could still be appropriate if the superior channel for the key, e.g. handing over a literal pad in-person in the past, differs from the inferior channel for the secret message, e.g. series of short radio broadcasts in the future.)
Consequently, depending on how closely biological information matches the metaphor of a one time pad, nobody should be surprised by the difficulty of disentangling its "messages" from its "noise". Inquisitive humans are the interceptors of the channel. The recipient of the channel is the organism, and the sender of the channel is the organism's ancestor(s). The biological information has flowed over a staggering number of channels or generations. In doing so, it has absorbed noise coming from an ever-changing key on the one time pad commonly known as the universe.
At the same time, the environment of the organism is ever-changing. This means that the definition of sensible biological information is also ever-changing, since biological information is sensible insofar as it corresponds successfully to an environment—oxygen-breathing organisms aren't sensible in an oxygen-deprived environment. Unlike the phantasmal perfection of first organisms communicated by a non-biological Alice, this concept of environmental sensibility is inescapably relative and limited. Metaphorically speaking, the "message" consists of biological information with environmental sensibility, and the rest is "noise". Due to environmental changes, the same bits of biological information can change from message to noise and back. Correct answers cease to be correct when the questions transform.
The inherent uncertainty of a one time pad forces an ignorant interceptor to admit that any possible message could result in any possible encrypted message. The one time pad causes the sender's input message to diverge into a random output message. In the context of restrictive human communication, the recipient is displeased by receiving any other message than the sender's. But in the context of biology, a descendant organism that receives innovative information could thereby surpass the ancestor in the broad criterion of environmental sensibility, if only by a little. To use an overstretched analogy, this biological case is more like a sender who sent the message "The meeting is at 3:30," and then the message changed along the way to "The meeting is at 3:00"...while the meeting was rescheduled to 3:15 anyway.
Showing posts with label Software Ideas. Show all posts
Showing posts with label Software Ideas. Show all posts
Thursday, March 28, 2013
Friday, April 22, 2011
I started reading an information theory textbook, and...
...I've concluded that in an optimal, entropy-approaching code for the textbook, the bits for sigma Σ would be minimal. Ba-dum, ching!
Wednesday, February 23, 2011
when 255 is too little and 65535 is too much
Before I begin, yes, I do realize that low-level bit twiddling is often not worth the time any more. When storage and memory is so cheap, and the sheer scale of processing has progressed so far even on a "smart" phone, little micro-optimizations may...just...not....matter. Switching to an alternative high-level algorithm/strategy better suited to the specific scenario is generally a much more promising route. However, network transmission and/or mass data record layout are two prominent exceptions to this rule. The fewer bytes that must be sent between computers on potentially high-latency and congested networks, the better, and the fewer average bytes per record that is multiplied by a potential quantity of thousands up to millions of records, the better. So there still is some life yet for bit twiddling.
One unsigned byte represents nonnegative integers up to 255 (including 0). Two unsigned bytes represent nonnegative integers up to 65535. The storage doubles, but the representational range squares. In basic-arithmetic-speak the range is "256 times its original amount". In science-speak the difference might be called "an approximate difference of 2.40824 on a logarithmic (lg) scale". In marketing-speak the difference might be called "an increase of 25,500%". The upshot is that this exponential jump could very well be overkill in particular cases. For instance, if data values never are less than zero or greater than 4000, 12 bits or one-and-half bytes is sufficient (2^12 -1 = 4095).
Yet reading and writing an arbitrary number of bits like 12 isn't well-supported, for valid performance reasons. Manipulating 12 bits in practice implies manipulating the next-greatest number of bytes, 2 or 16 bits. And that implies 4 wasted bits for "spacing" or "alignment". Why not use these bits? Two of these 12-bit unsigned integers sum to 24 bits or 3 bytes. When each of the two 12-bit number is stored in 16 bits, the fourth overall byte, consisting of the top 4 unused bits from the first number and the top 4 unused bits from the second, are waste. Avoidance of that fourth byte reduces total storage needs by 25%. Whether this substantial relative space savings amounts to a substantial absolute/raw space savings depends on the application and other factors. I'd venture that in the majority of the time it doesn't. But when it does...
Each 12-bit integer is a byte-and-half. For two, just combine their two half-bytes (4 bits) into byte number three (8 bits). The conversion procedure from two nonnegative 12-bit integers, originally stored as two 16-bit integers (i.e. unsigned short) or 4 total bytes originally:
One unsigned byte represents nonnegative integers up to 255 (including 0). Two unsigned bytes represent nonnegative integers up to 65535. The storage doubles, but the representational range squares. In basic-arithmetic-speak the range is "256 times its original amount". In science-speak the difference might be called "an approximate difference of 2.40824 on a logarithmic (lg) scale". In marketing-speak the difference might be called "an increase of 25,500%". The upshot is that this exponential jump could very well be overkill in particular cases. For instance, if data values never are less than zero or greater than 4000, 12 bits or one-and-half bytes is sufficient (2^12 -1 = 4095).
Yet reading and writing an arbitrary number of bits like 12 isn't well-supported, for valid performance reasons. Manipulating 12 bits in practice implies manipulating the next-greatest number of bytes, 2 or 16 bits. And that implies 4 wasted bits for "spacing" or "alignment". Why not use these bits? Two of these 12-bit unsigned integers sum to 24 bits or 3 bytes. When each of the two 12-bit number is stored in 16 bits, the fourth overall byte, consisting of the top 4 unused bits from the first number and the top 4 unused bits from the second, are waste. Avoidance of that fourth byte reduces total storage needs by 25%. Whether this substantial relative space savings amounts to a substantial absolute/raw space savings depends on the application and other factors. I'd venture that in the majority of the time it doesn't. But when it does...
Each 12-bit integer is a byte-and-half. For two, just combine their two half-bytes (4 bits) into byte number three (8 bits). The conversion procedure from two nonnegative 12-bit integers, originally stored as two 16-bit integers (i.e. unsigned short) or 4 total bytes originally:
- 1) Get the first output byte by truncating/coercing/casting the first 16-bit integer. This grabs the 8 low bits of the first number, leaving the remaining 4 high bits to be stored separately.
- 2) In the same way as #1, get the second output byte by truncating/coercing/casting the second 16-bit integer.
- 3) Get the remaining 4 high bits of the first integer by right-shifting it by 8 and then coercing to a byte. The result is the top 4 bits of the first integer stored in a byte as-is. All the lower 8 bits of the first integer, that were previously stored in #1 and became the first output byte, are right-shifted out completely into oblivion and zeroes take the place of the shifted bits. This byte is now the "top byte" of the two bytes in the original 16-bit storage of the integer.
- 4) In the same way as #3, get the remaining 4 high bits of the second integer by right-shifting by 8 and then coercing to a byte.
- 5) The third and last output byte will be a concatenation of the two integers' top 4 bits, which after #3 and #4 are now themselves stored as bytes in which the top 4 bits are waste. The lower 4 bits of the output byte will come from the #3 byte, in which the correct bits are already at the correct lower position. But the higher 4 bits of the output byte will come from the #4 byte, in which the correct bits are also in the lower position, which is incorrect for those 4 bits. Therefore, left-shift #4 byte by 4 to get the bits in final position.
- 6) Finally the two parts of the third byte, the #3 byte and #5 byte, are ready, with correct bits in correct positions and zeroes everywhere else. Thus each bit in each byte must "override" the zeroed-out bit in the other. If corresponding bits are both 0, then 0. If corresponding bits are 0 and 1, then 1. Since this is the definition of a bitwise OR (|), perform that operation on the #3 byte and #5 byte to get the third byte.
- 1) The lower 8 bits of the first integer is the first input byte. Expand/coerce/cast the first input byte to 16-bit storage.
- 2) The lower 8 bits of the second integer is the second input byte. Also expand/coerce/cast to 16 bits.
- 3) The third byte has the top 4 bits of the first integer in lower position, and the top 4 bits of the second integer in higher position. To get the top 4 bits of the first integer as a byte all to itself, just the top 4 bits of the third byte must be replaced with zeroes. For each of those top 4 bits, 0 must result in 0 and 1 must result in 0. For each of the lower bits, the bits that must remain undisturbed, 0 must result in 0 and 1 must result in 1. A bitwise operation that sometimes turns the first bit into 0 and sometimes into 1, depending on the second bit, is AND (&). Any time the second bit of an AND is 0, the result will be 0. Any time the second bit of an AND is 1, the result will equal the first bit. Bitwise AND fits the purpose, provided all the bits in the second byte are set accordingly. This second byte for the AND operation, known as a bitmask because the zeroes in it will "mask" or hide bits while the ones in it will "pass through" bits, must simply have zeroes in the 4 higher bits and ones in the 4 lower bits. Expressed as a byte, the bitmask is "0000 1111" in binary or "0F" in hexadecimal or "15" in decimal. In any case, the result of the AND operation between the third input byte and the bitmask is the upper 4 bits of the original first 12-bit integer.
- 4) The higher 4 bits of the third input byte are the top 4 bits of the original second 12-bit integer. To get these 4 bits as a byte all to itself, right-shift the third input byte by 4. This pushes out the lower 4 bits of the third input byte, that was stored in #3, and leaves the higher 4 bits (with replacement zeroes in higher position).
- 5) The #3 byte, in which the higher 4 bits of the original first12-bit integer are in lower position within one byte, equals the original higher byte of the 16-bit storage of the original first 12-bit integer. All that's necessary is to put the #4 byte into correct 16-bit position, which is precisely one byte or 8 bits higher. Expand/coerce/cast the #3 byte into 16-bit storage and then left shift by 8.
- 6) In the same way as #5, expand/coerce/cast the #4 byte into 16-bit storage and then left shift by 8.
- 7) #1 and #5 are 16-bit unsigned integers whose bits are all in the right locations for the original first 12-bit integer stored in two bytes. And any bits in #1 and #5 that don't match are zero. This means that a bitwise OR between the two will "override" the placeholder zeroes with the real bits in the other and get the original two bytes of the first 12-bit integer.
- 8) In the same way as #7, #2 and #6 after bitwise OR will be the original two bytes of the second 12-bit integer.
Sunday, December 05, 2010
impossible flawless imitation
The inherent limitations to the analysis of a Turing Machine (TM) include a consequence that deserves more attention. A TM cannot be imitated perfectly through a generalized algorithm. Stated differently, a TM's full "description" cannot be inferred from an arbitrary amount of its output ("tape"). No empirical method can be all-sufficient for determining the exact source TM. You can't deduce a TM by its "footprints".
The reasoning is similar to other TM arguments, e.g. the inability to compute when/if a TM will halt. Suppose we know that some TM "S" produces a specific sequence of finite bits, and thereby we contrive our own TM "I" that produces the same sequence. Well, what about the next bit after that sequence (or the next erasure, etc.)? "I", according to its present configuration, produces either a 1 or 0. But given that the output is only assured of matching up to the last bit, there's no theoretical reason why "S" couldn't be identical to "I" in every way except for the configuration for the next bit, in which case our otherwise flawless imitation is ruined. For example, even if "S" has steadily produced alternating bits, it still might produce "00" at position 30, or 35, or 1,462.
Moreover, the situation could be quite possibly worse in a multitude of creative ways. Perhaps "S" is really a complex specimen of Universal Turing Machine with two "source" TM that are much different (analogous to a "piece-wise" mathematical function). "S" executes source TM "A" and then increments a count. When the count exceeds a predetermined value, "S" branches to a configuration that executes the other source TM "B". One may elaborate on "S" with further counts, branching, and source TMs. The point is to reiterate that although we can invent an "I" to imitate "S", we can never conclude that "I" and "S" are exact matches, and in fact we can't figure the ultimate similarity at all as execution stretches to infinity. Failure of "I" is due to missing information about the "S" TM, but worse is the more subtle problem: there's no way to know how much information is missing at any time!
So...
The reasoning is similar to other TM arguments, e.g. the inability to compute when/if a TM will halt. Suppose we know that some TM "S" produces a specific sequence of finite bits, and thereby we contrive our own TM "I" that produces the same sequence. Well, what about the next bit after that sequence (or the next erasure, etc.)? "I", according to its present configuration, produces either a 1 or 0. But given that the output is only assured of matching up to the last bit, there's no theoretical reason why "S" couldn't be identical to "I" in every way except for the configuration for the next bit, in which case our otherwise flawless imitation is ruined. For example, even if "S" has steadily produced alternating bits, it still might produce "00" at position 30, or 35, or 1,462.
Moreover, the situation could be quite possibly worse in a multitude of creative ways. Perhaps "S" is really a complex specimen of Universal Turing Machine with two "source" TM that are much different (analogous to a "piece-wise" mathematical function). "S" executes source TM "A" and then increments a count. When the count exceeds a predetermined value, "S" branches to a configuration that executes the other source TM "B". One may elaborate on "S" with further counts, branching, and source TMs. The point is to reiterate that although we can invent an "I" to imitate "S", we can never conclude that "I" and "S" are exact matches, and in fact we can't figure the ultimate similarity at all as execution stretches to infinity. Failure of "I" is due to missing information about the "S" TM, but worse is the more subtle problem: there's no way to know how much information is missing at any time!
So...
- Generally speaking, clean-room or black-box re-implementations of software can't be guaranteed to succeed in every detail. For instance, exceptional conditions or highly specific combinations of input could trigger diverging outcomes. The imitation of the previous software's bugs can be particularly troublesome.
- Whenever we compute/emulate anything that requires TM-level processing, we can't be absolutely sure whether we ever achieve total fidelity. This doesn't imply that a conceptual TM is the most convenient theoretical model of a phenomenon (it usually isn't!), but merely that the chosen model can somehow run on a "computer". Or in more formal terms, the model outlines an explicit procedure that yields predictions of one or more modeled quantities. In some open-and-shut cases, the model's error is persistently trivial under all experiments and we therefore have no justification for doubting its effectiveness. Yet history is filled with times in which a serviceable first model (circular planetary orbits with epicycles) is later replaced by one with still greater nuance and realism (elliptical).
- Third-person human brains (and behavior) undoubtedly fall under these conclusions. That is, any verbalized or intuitive model of a specific brain cannot be completely understood. Sympathy has its limits. While careful observation combined with genuine insight yields an impressive array of knowledge about "what makes his or her brain tick", especially among species evolved to be highly social, the application of the knowledge often proves how superficial it truly is. Biographers can detect what they call formative experiences, but tasking the biographer to reliably predict what a living subject shall do next will illustrate that the analysis works best retroactively. Of course, if someone theorizes that the human brain performs computations beyond the power of a TM, then the argument is correspondingly strengthened. Two such brains surely cannot be synchronized when two lowly TMs cannot. The proposed mechanism of brain-based super-computation, e.g. quantum effects of exotic pieces/states of matter, would increase not decrease the difficulty.
- More surprisingly, first-person human brains (and behavior) are also prone. There's no "imitation" of oneself, but there's a self-concept. The very process of answering the self-evident question, "Why did I think or do that?", necessitates the mental model of self. The brain attempts to develop a "TM" whose workings imitate the actual and mysterious workings of itself. Unfortunately, it's sabotaged by bias toward a pleasing answer! Transparency and self-knowledge are notoriously plagued by mistakes. Countless fictional works have relied on the humor if not the tragedy of self-delusion.
Wednesday, July 14, 2010
persistence of private by the Nucleus pattern
The encapsulation of data by an object's methods is one of the foremost goals of effective OOP. Restricting exposure of the object's private information prevents other code from accessing it. The inaccessibility ensures that the other code can't depend upon or otherwise share responsibility for the private information. Each object has a single responsibility: a sovereign private realm of information and expertise.
However, this ideal conflicts with the reality of the need to give objects persistence because most programs require data storage in some form. And the required interaction with the storage mechanism clearly isn't the responsibility of the objects that happen to correspond to the data. Yet how can the objects responsible, often known as repositories or data mappers, mediate between external storage and other objects while obeying encapsulation? How can information be both private and persistent without the object itself assuming data storage responsibility?
The "Nucleus" design pattern, very similar to an Active Record, addresses this issue. According to the pattern, a persistent object, similar to a eukaryotic cell, contains a private inner object that acts as its "nucleus". The nucleus object's responsibilities are to hold and facilitate access to the persistent data of the object. Therefore its methods likely consist of nothing more than public "getters and setters" for the data properties (and possibly other methods that merely make the getters and setters more convenient), and one of its constructors has no parameters. It's a DTO or VO. It isn't normally present outside of its containing object since it has no meaningful behavior. Since the nucleus object is private, outside objects affect it only indirectly through the execution of the containing object's set of appropriate information-encapsulating methods. The containing object essentially uses the nucleus object as its own data storage mechanism. The nucleus is the "seed" of the object that contains no more and no less than all the data necessary to exactly replicate the object.
Naturally, this increase in complexity affects the factory object responsible for assembly. It must initialize the nucleus object, whether based on defaults in the case of a new entity, or an external query performed by the storage-handling object in the case of a continuing entity. Then it must pass the nucleus object to the containing object's constructor. Finally, it takes a pair of weak references to the containing object and nucleus object and "registers" them with the relevant stateful storage-handling object that's embedded in the execution context.
The object pair registration is important. Later, when any code requests the storage-handling object to transfer the state of the containing object to external storage, the storage-handling object can refer to the registration list to match the containing object up to the nucleus object and call the public property methods on the nucleus object to determine what data values to really transfer.
Pro:
However, this ideal conflicts with the reality of the need to give objects persistence because most programs require data storage in some form. And the required interaction with the storage mechanism clearly isn't the responsibility of the objects that happen to correspond to the data. Yet how can the objects responsible, often known as repositories or data mappers, mediate between external storage and other objects while obeying encapsulation? How can information be both private and persistent without the object itself assuming data storage responsibility?
The "Nucleus" design pattern, very similar to an Active Record, addresses this issue. According to the pattern, a persistent object, similar to a eukaryotic cell, contains a private inner object that acts as its "nucleus". The nucleus object's responsibilities are to hold and facilitate access to the persistent data of the object. Therefore its methods likely consist of nothing more than public "getters and setters" for the data properties (and possibly other methods that merely make the getters and setters more convenient), and one of its constructors has no parameters. It's a DTO or VO. It isn't normally present outside of its containing object since it has no meaningful behavior. Since the nucleus object is private, outside objects affect it only indirectly through the execution of the containing object's set of appropriate information-encapsulating methods. The containing object essentially uses the nucleus object as its own data storage mechanism. The nucleus is the "seed" of the object that contains no more and no less than all the data necessary to exactly replicate the object.
Naturally, this increase in complexity affects the factory object responsible for assembly. It must initialize the nucleus object, whether based on defaults in the case of a new entity, or an external query performed by the storage-handling object in the case of a continuing entity. Then it must pass the nucleus object to the containing object's constructor. Finally, it takes a pair of weak references to the containing object and nucleus object and "registers" them with the relevant stateful storage-handling object that's embedded in the execution context.
The object pair registration is important. Later, when any code requests the storage-handling object to transfer the state of the containing object to external storage, the storage-handling object can refer to the registration list to match the containing object up to the nucleus object and call the public property methods on the nucleus object to determine what data values to really transfer.
Pro:
- The containing object doesn't contain public methods to get or set any private data of its responsibility.
- The containing object has no responsibility for interactions with external storage. It only handles the nucleus object.
- Since the nucleus object's responsibility is a bridge between external storage and the containing object, design compromises for the sake of the external storage implementation (e.g. a specific superclass?) are easier to accommodate without muddying the design and publicly-accessible "face" of the containing object.
- The nucleus object is one additional object/class for each persistent original object/class that uses the pattern. It's closely tied to the containing object, its factory object, and its storage-handling object.
- The original object must replace persistent data variable members with a private nucleus object member, and the containing object's methods must instead access persistent data values through the nucleus object's properties.
- The containing object's constructors must have a nucleus object parameter.
- The factory must construct the nucleus object, pass it to the containing object's constructor, and pass along weak references to the storage-handling object.
- The storage-handling object must maintain one or more lists of pairs of weak references to containing objects and nucleus objects. It also must use these lists whenever any code requests a storage task.
- The code in the storage-handling object must change to handle the nucleus object instead of the original object.
Friday, July 18, 2008
the Woven cipher
Like the previous other two ciphers, the Woven cipher isn't great at encrypting, but nevertheless it was an engrossing hobby project that had a few fascinating wrinkles (is it any wonder I sometimes have trouble explaining to other people how I spend my time?). The basic idea behind this cipher is to convert all the characters of the plaintext into binary digits (bits), combine the bits of adjacent characters by alternately taking n bits from one then the other, and then iteratively mixing the resulting adjacent bit sequences in the same manner until only one remains. The key is the set of n values to use, which the algorithm cycles. Forming an intact whole out of little parts through an alternating process is analogous to weaving, hence the name "Woven". A second excellent name, perhaps more accurate but less appealing, is "the Multiplexed cipher", where the multiplexing is done using fixed data lengths, as opposed to fixed time periods for instance.
To work, the algorithm can't be quite that simple...
In the encryption and decryption, the role of each bit sequence is filled by a generator. This includes the special-case generators at the start and finish, described in the code as "Initial" or "Final". These generators are somewhat similar to the rest but perform additional conversion tasks and/or obtain data directly from arrays. As a result the program's high-level design is also functional: first it constructs one or more function(s) that will produce the answer (bit by bit) and then it repeatedly calls the function(s). By contrast, the imperative design would be to manipulate actual bit sequences, not generators, by using one set of sequences to create the next set of sequences, etc. The loops are virtually the same in the imperative and functional designs, but the functional equivalent has generators in place of bit sequences--each set of generators is the input for the next set of generators. And each generator (a closure) only knows about the generators that were explicitly placed into its scope at creation.
Encrypt generators merge the output of two encrypt generators: they have two parents and one child. Decrypt generators do the reverse, requesting solely "half" of the output from one source decrypt generator and then dividing it up so that it can return the "left" half to one destination decrypt generator and the "right" half to another destination decrypt generator: they have one (-half?) parent and two children. Encrypt generators act like multiplexers. Decrypt generators act like demultiplexers. The individual generators know and do very little:
My Javascript implementation of the Woven cipher is part of the blog layout, down at the bottom on the right. In the page source, search for 'wovencipher' (one word). If you use it or download it I am not responsible for any awful consequences. I don't advise you to use it for encrypting anything important or unimportant. Legal disclaimer, etc. It's purely for entertainment, educational, and/or edutainment purposes.
To work, the algorithm can't be quite that simple...
- Most of the time, when the numbers don't match up exactly right, a source sequence won't have enough bits. When that happens, the strategy is to instead use an extra 0. Since the "weaving" happens from right to left, least-significant to most-significant, these are leading 0s so the numerical value of the original sequence is unchanged. Of course, the intermediate sequences aren't numbers so the leading 0s matter very much. And if the very last sequence, or each chunk (7 bits long?) that rather long sequence is broken into, must be interpreted as a fully-representative binary numeral, there would need to be an additional convention like always prepending an extra 1. Extra 0s are needed if the sequences are different lengths, like a space's 6 compared to a letter's 7, and/or the sequence isn't evenly divisible by n. These filler 0s cause the encrypted (i.e. transposed?) message to be longer than the original. Unorganized testing shows that the amount of expansion for short sentences is commonly 100% or more, probably an exponential-like effect of extra 0s requiring still more extra 0s in successive iterations.
- Another extremely common complication is an odd quantity of sequences to be woven. The coping technique I eventually settled on was to spontaneously create a "null" or "empty" sequence of length 0 to pair up with the loner. Loner or not, the quantity of data sequences to be woven is half the previous quantity of sequences, rounded up. For instance, 10, 5, 3, 2, 1. Or 8, 4, 2, 1. The rounding--throwing away information-- implies that when decryption happens, the encrypted message and the key aren't enough to reconstruct the original. This is one reason why the third decryption input, the original number of characters, is necessary upfront (the other reason is to let the decryption algorithm know for certain when to quit unraveling sequences).
In the encryption and decryption, the role of each bit sequence is filled by a generator. This includes the special-case generators at the start and finish, described in the code as "Initial" or "Final". These generators are somewhat similar to the rest but perform additional conversion tasks and/or obtain data directly from arrays. As a result the program's high-level design is also functional: first it constructs one or more function(s) that will produce the answer (bit by bit) and then it repeatedly calls the function(s). By contrast, the imperative design would be to manipulate actual bit sequences, not generators, by using one set of sequences to create the next set of sequences, etc. The loops are virtually the same in the imperative and functional designs, but the functional equivalent has generators in place of bit sequences--each set of generators is the input for the next set of generators. And each generator (a closure) only knows about the generators that were explicitly placed into its scope at creation.
Encrypt generators merge the output of two encrypt generators: they have two parents and one child. Decrypt generators do the reverse, requesting solely "half" of the output from one source decrypt generator and then dividing it up so that it can return the "left" half to one destination decrypt generator and the "right" half to another destination decrypt generator: they have one (-half?) parent and two children. Encrypt generators act like multiplexers. Decrypt generators act like demultiplexers. The individual generators know and do very little:
- When an encrypt generator is created, it's passed a 'left' source generator, a 'right' source generator, and the number of bits to read from a generator before switching to the other--this number is one of the digits from the key, so only digits 2-9 are admissible in the key. The encrypt generator keeps a running count of how many bits it has processed, whether it is currently reading from the left or right, and whether either of the generators has run out of bits. If both generators are empty (i.e., return '') then the generator returns '', the empty string. If the generator set to return the next bit isn't empty, then return that bit. If the generator set to return the next bit is empty but the other generator is not empty, return a '0' for filler. Encrypt generators take one 'op' parameter. If the op is 'peek' then the generator skips updating any variables (and it also passes 'peek' if it needs to ask a source generator for the next bit) , otherwise it increments the bit counter and then switches which generator to read if the counter modulo the key value is 0.
- When a decrypt generator is created, it's passed one source generator, which "side" of bits the decrypt generator will always ask for from its source generator (left or right), and the number of bits to process before switching the side that should receive the next bit coming from the source generator--once again, this number is a key digit. Decrypt generators take one parameter, the "side" that is requesting its next bit. The decrypt generator keeps a running count of how many bits it has processed and whether the next bit it reads from the source generator should be returned as a 'left' bit or 'right' bit. It also keeps short queues for left bits and right bits. If it's asked for a bit for a side whose queue isn't empty, then it returns from the queue. Else, it checks which side should receive the source generator's next bit. If the side matches, get a bit and return it. If the side doesn't match, fill up the queue for the opposite side with bits from the source generator, then return the next bit that does apply to the requested side. Naturally, at any time the source generator sends a bit, update the bit counter and toggle the "side" variable if counter modulo n is 0. When the requested side is "all", the decrypt generator does nothing more than return the next bit it receives from its source generator (which is always just from one "side"). The "all" case is purely for the last stage of decryption, when no more "unraveling" is necessary.
My Javascript implementation of the Woven cipher is part of the blog layout, down at the bottom on the right. In the page source, search for 'wovencipher' (one word). If you use it or download it I am not responsible for any awful consequences. I don't advise you to use it for encrypting anything important or unimportant. Legal disclaimer, etc. It's purely for entertainment, educational, and/or edutainment purposes.
Wednesday, June 25, 2008
the Autonomous cipher
The basic outline of a cipher is that it combines a message and a key to produce unreadable text. At some level, then, the key is like a set of instructions that informs the encryption algorithm about what to do with the message. Compare the process to that of assembling an attractive grill. The message is like the parts, the key is like the sheet of directions, and the finished grill is the encrypted message (presuming the directions' language isn't French--"Le grille?...What the hell is that?!"). Why not have the key actually be a set of encryption instructions?
The Autonomous cipher uses this concept. In this cipher, each digit of the key represents two operations (the key "self-directs" the encryption to a large degree, hence the name Autonomous). The operations' instructions assume a "current cipher index" variable, i, of an array of numbers representing the message. For that matter, the message array could be visualized as a long "tape" and i could be visualized as a "read/write head" that shifts position along the tape:
The Autonomous cipher uses this concept. In this cipher, each digit of the key represents two operations (the key "self-directs" the encryption to a large degree, hence the name Autonomous). The operations' instructions assume a "current cipher index" variable, i, of an array of numbers representing the message. For that matter, the message array could be visualized as a long "tape" and i could be visualized as a "read/write head" that shifts position along the tape:
- "0" increases by 3 the array value whose index is i and subtracts 3 from i; corresponding operations of the remaining digits will be displayed as pairs
- "1" is (3,2)
- "2" is (1,2)
- "3" is (3,-1)
- "4" is (1,-3)
- "5" is (2,3)
- "6" is (1,-1)
- "7" is (3,4)
- "8" is (2,-2)
- "9" is (1,4)
- Convert the message to a numerical representation as an array of numbers. Duh.
- Set both i and the reading position of the key (key index) to 0. Perform the following two steps until the resulting value of i is 0.
- Use the key index to determine which instruction to execute, then execute it. This means i and the message array are updated accordingly. When i "falls off" either end of the array, merely wrap it around to the other end.
- Increment the key index. Wrap around to the start of the key if the end is reached.
- Undo the last increment of the key index; it wasn't actually used. Then append this last value of the key index to the end of the message array.
- Any given key may not really work for any given plaintext, for two definitions of "not work": 1) the key starts with a sequence like '18' (i + 2 followed by i - 2) and therefore halts before barely encrypting at all, 2) the key consists of instructions that will jump around but never return to 0 in a reasonable time for the message text. The most extreme example of case 2 is a key that starts with '9' (i + 4) then continues in infinite repetitions of '583' (i + 3, i - 2, i - 1). However, all finite keys will repeat eventually, and with enough repetitions all keys must return to 0. When the net effect of a key is to go from 0 to "x" on a message of length "m", there exist a number of repetitions "r" and a multiple of m "n" such that m*n = r*x. (Any multiple of m works for returning to index 0 because it's identical--congruent--to repeatedly moving from 0 to 0, 0 = (0 + m) % m.) Rearranging terms yields m/x = r/n where m and x are constants. One solution among many to this two-variable equation is r = m and n = x. Stated in practical language, no matter what a finite key does and no matter how long the message is, the index will certainly return to 0 after cycling through the entire key a number of times equal to the message length. For long keys and even longer messages, this absolute upper bound on the number of key repetitions is hardly restrictive. In any case, the key may return to 0 much sooner anyway, of course.
- Although infinite keys may cause problems, any finite key may be arbitrarily extended to produce a longer key, a key that requires more cycles to return to 0. This extension can be accomplished with a couple techniques, but I suppose there are many more possibilities. 1) Replace one or more digit(s) with other digits that have the same net effect. The key thereby will take more iterations to return to 0, as long as none of the "intermediate" steps of the replacement lands on 0. Two '1' or '2' digits (+2 each) can take the place of one '9' digit (+4). A '76' (+4 then -1) can take the place of a '5' (+3). A more complicated replacement is '2' (+2) by '2233' (+2 then +2 then -1 then -1). And each '2' in '2233' can then be replaced by '2233', and so on--but each of these specific replacements only succeed when the value of i before the replacement point, "j", solves none of the equations (j + 4) % arraylength = 0, (j + 3) % arraylength = 0, (j + 2) % arraylength = 0. 2) At some point in the key when i is not 0, i.e not at the start or end, insert another (full) key. By definition, the net effect of the inserted key will be 0. However, as before, the inserted key itself must not incidentally hit 0 as it runs. Assuming the value of i in the enclosing key is "j" at the point before insertion, the inserted key must not ever attain the i index "message length - j" when it runs on its own. For instance, consider two boring keys for a simple message length of 8. Key A is '99' (+4 then +4). It has only one intermediate value, 4. Key B is '2143' (+2 then +2 then -3 then -1). To make B longer by inserting A, A could be inserted before '3' to produce '214993' because one of A's intermediate i values isn't 8 - 1 = 7. A couldn't be inserted before '4' to produce '219943' because (the only) one of its intermediate values is 8 - 4 = 4. The key would end up being '219', one digit smaller than the original! Notice that B could also be inserted into B, '2(2143)143', since B never assigns i the index 8 - 2 = 6.
- Looking at the list of ten operations, some interrelationships stick out. Five decrease i (03468), five increase i (12579). For each particular change in i, one or two operations perform it. -3 has 0 and 4, -2 has 8, -1 has 3 and 6, +2 has 1 and 2, +3 has 5, +4 has 7 and 9. Complementary pairs of digits that undo each other's effects, and so are by definition keys, are (0 or 4) and 5, (1 or 2) and 8. Complementary triples of digits that have a net effect of 0, and so are by definition keys, are (1 or 2) and (3 or 6) and (3 or 6), (3 or 6) and 8 and 5, (7 or 9) and 8 and 8, (7 or 9) and (0 or 4) and (3 or 6). Remember that order doesn't matter in these keys. '385' is a key so '538' is also a key.
- The digits of irrational numbers might not be as promising for keys as one would think. pi is not. '3141', one of the possible four-digit keys, is when pi stops. The square root of 2 isn't much better. It stops at '14142', five digits long. '141' having a net effect of +1 is a bit of a show-stopper.
- A direct crack of a ciphertext, as opposed to indirect methods like brute-force and related-key, begins by looking for clues that help indicate how the ciphertext could have been generated. Metaphorically, the destination ciphertext aids in reconstructing and reversing the encryption journey. But for this cipher, the origin and the destination are the same in one aspect: i's value. The tracing of i hinges on examining the array values; the question is how each final array value differs from what it was before one or more visits from i. Unfortunately, those differences are difficult to compute, because determining the unknown starting value of each array element is precisely the goal of the entire cracking process. Nevertheless, since the starting values are in a known range and each individual increase is in the range 1-3, someone could calculate "ballpark estimates" for the number of "i visits" that possibly resulted in a particular number. The final number of the encrypted message array, an index into the key, also establishes a lower bound for key size. Additionally, a key that repeated before returning to 0 would probably show more telling patterns than a key that didn't. On the other hand, larger keys are more bothersome to work with.
- Alternative bases for the key digits is a possible variation that would enable a greater quantity of symbolized operations. The operations could become as complex as desired, too, although all operations must be exactly reversible, have inverses. A key in one base could be equivalent (isomorphic) to a key in a second base as long as the set of symbolized operations for the keys was equivalent. The key in the larger base would then be a "short version" of the key in the smaller base, similar to how programmers use hexadecimal.
Monday, June 23, 2008
the Pointed cipher
Sometimes a leisure-time project can take on a life of its own. In this case, it started with an inspiration I had. I lay much of the blame on my recent reading. What if someone used a cellular automaton (like Game of Life) to encrypt a message? Specifically, Alice would encrypt by encoding a message as an initial configuration of a cellular automaton (like Game of Life) and using the key to determine how board positions map to characters. My motivation was that a message "becoming alive and growing" into unexpected patterns would be quite hard to crack. Imagine if the information content of a message repeatedly folded over onto itself, thereby exhibiting strange chaotic order sensitive to initial conditions. Let a codebreaker try to analyze that.
The problem, as some readers have already guessed, is that the more interesting examples of automata are just not guaranteed to be reversible. A message's encoded information wouldn't always survive the journey. Two different messages could end up with the same outcome, for instance, or a message could merely evaporate after a short number of iterations. The original signal would be so entangled with the encryption-introduced noise that it wouldn't be salvageable...by anyone. Grudgingly, I adopted the restriction that none of my automata rules would include creation or destruction.
This restriction, along with the need to assign axes to the automata cells for the character mapping, just left me with a set of Cartesian coordinates whose movements somehow affected one another. Then I made several observations: 1) per common practice, the number of dimensions can be set arbitrarily; 2) each character can be encoded as the distance along a particular axis for a particular point; 3) interactions among multiple points at once, especially given the number of points needed to encode a message, are often too impractical to calculate.
After experimentation, I've settled on this impractical encryption algorithm. The idea is to divide up a message into tuples or "points", and perform pairwise-defined substitution and transposition on each point for a number of cycles. Track the values of the three parameters that control the operation of the substitution and/or transposition, and increase them (with wrap-around back to the start value) after each cycle. The three parameters are 1) the axes of the two points to manipulate, 2) the index gap between the two points (later points will wrap around the current "block" of points to find a partner and only if the later point wasn't already the partner for that earlier point), 3) the amount to manipulate with. The key has two parts, 1) the positive integer number of cycles "n" and 2) an integer array of axis initializations "d". The dimension "D" is the number of axes, so it's the length of d. D must be one of 2, 3, 5, 6, 10, 15--kudos to anyone who recognizes the definition of this set of numbers before the end of this paragraph. The recommended n is at least 210. However, the size of n must be carefully counterweighted with message length because either one has a strong tendency to drop time-performance (i.e., complexity) like a brick. Analytically speaking, I believe the relationship is O(n), but I only have rough empirical confirmation of that.
One characteristic of this cipher that it shares with many other ciphers is its cyclic nature. In earlier tries at an algorithm, when the block size was the length of the text and no transposition step was integrated, specific plaintext lengths and dimension choices would result in characters that only flipped sign, no matter the number of cycles! These were "degenerate" cases when two indexes always paired up with one another; therefore, each gap parameter value was always paired up with the same axis parameter value (although the converse was usually not true--a specific axis parameter might or might not always have been paired up with the same gap parameter).
In a simple example, a message length (padded if necessary) of 32 and a dimension of 4 results in 32/4 = 8 "points" and consequently 8 possible gap values between indexes (0,4...24,28). The gap parameter value cycles through its 8 possibilities as the axis parameter value simultaneously cycles through its 4 possibilities (0,1,2,3), so now a gap of 4 always applies to an axis of 1, a gap of 8 always applies to an axis of 2, etc. The pattern is clear: a "degenerate" case corresponds to when the number of axes is a divisor of the number of possible gaps, for it is those circumstances when the cycle of axis parameter values maps nicely onto the cycle of gap parameter values. What we have here is an application of the fundamental theorem of cyclic groups (the description "maps nicely onto" is an informal way of stating that one cycle or cyclic group is a subgroup of another cycle or cyclic group).
The complication of avoiding these degenerate cases motivated me both to switch to a set block length and to a set collection of dimension choices, where the dimension choices are 1) divisors of the block length and are not 2) divisors of the associated quotient. Put another way, the block length is the least common multiple of the dimension and the block length divided by the dimension. In addition, I preferred smaller block lengths and dimension choices but I also preferred larger amounts of dimension choices. Check the handy Wikipedia table yourself if you like. As previously mentioned, I opted for a block length of 30. The recommended number of cycles, 210, is the least common multiple of the gap parameter possibilities, axis parameter possibilities, and amount parameter possibilities (7).
Key strength is a question I'm not sure how to answer. The key's "n", the number of cycles to run on the set of message points, isn't much of an obstacle to brute-force cracking. If x number of cycles hasn't yielded even a partial message, trying x + 1 cycles is no problem at all! The key's "d", the array of axis initializations, is more promising. Assuming 2000 possible values for an axis (-999 to 1000 inclusive), for a choice of 2 dimensions that would be a mere 2000^2 = 4,000,000. But a choice of 15 dimensions would be 2000^15 = 3.2768 * 10^49 according to my calculator, which is greater than 2^128. In any case, my limited testing indicated that the initialization value for a chosen axis affects a proper subset of message characters, not all--partial keys are useful, but to verify the correctness of the partial key one must have the number of cycles right.
My Javascript implementation of the Pointed cipher is part of the blog layout, down at the bottom on the right. In the page source, search for 'pointedcipher' (one word). If you use it or download it I am not responsible for any awful consequences. I don't advise you to use it for encrypting anything important or unimportant. Legal disclaimer, etc. It's purely for entertainment, educational, and/or edutainment purposes.
The problem, as some readers have already guessed, is that the more interesting examples of automata are just not guaranteed to be reversible. A message's encoded information wouldn't always survive the journey. Two different messages could end up with the same outcome, for instance, or a message could merely evaporate after a short number of iterations. The original signal would be so entangled with the encryption-introduced noise that it wouldn't be salvageable...by anyone. Grudgingly, I adopted the restriction that none of my automata rules would include creation or destruction.
This restriction, along with the need to assign axes to the automata cells for the character mapping, just left me with a set of Cartesian coordinates whose movements somehow affected one another. Then I made several observations: 1) per common practice, the number of dimensions can be set arbitrarily; 2) each character can be encoded as the distance along a particular axis for a particular point; 3) interactions among multiple points at once, especially given the number of points needed to encode a message, are often too impractical to calculate.
After experimentation, I've settled on this impractical encryption algorithm. The idea is to divide up a message into tuples or "points", and perform pairwise-defined substitution and transposition on each point for a number of cycles. Track the values of the three parameters that control the operation of the substitution and/or transposition, and increase them (with wrap-around back to the start value) after each cycle. The three parameters are 1) the axes of the two points to manipulate, 2) the index gap between the two points (later points will wrap around the current "block" of points to find a partner and only if the later point wasn't already the partner for that earlier point), 3) the amount to manipulate with. The key has two parts, 1) the positive integer number of cycles "n" and 2) an integer array of axis initializations "d". The dimension "D" is the number of axes, so it's the length of d. D must be one of 2, 3, 5, 6, 10, 15--kudos to anyone who recognizes the definition of this set of numbers before the end of this paragraph. The recommended n is at least 210. However, the size of n must be carefully counterweighted with message length because either one has a strong tendency to drop time-performance (i.e., complexity) like a brick. Analytically speaking, I believe the relationship is O(n), but I only have rough empirical confirmation of that.
- Divide the plaintext into blocks of length 30, padding out the last block with blanks if necessary. After each conversion from character to number happens, add the corresponding element from d (plaintext index % dimension works fine). Perform the following steps on each block and concatenate all results. Actually, my implementation modifies the original array in-place, block by block, by passing indexes around.
- Start the axis parameter value at 0 (the first axis), the gap at 2*D (pair the current point to the point after the next point), and the amount at 1. Perform the following steps n times.
- Loop over the block point by point (this just means increasing the index by D each time), performing the following two steps.
- Use the current values of the axis and gap parameters to determine what two indexes to work on. If the referenced values are equal, multiply both values by -1. Otherwise, adjust each referenced value such that the greater increases and the lesser decreases by the value of the amount parameter.
- Swap the value of the first axis of the current point with the value of whichever axis of the next point (again, wrapping around the block if necessary) is indicated by the axis parameter.
- After looping over the points, adjust the three parameters for the next cycle, starting over at 0 if exceeding maximums. The axis value increases by 1 up to D - 1, the gap value increases by D up to 29 (remember that the gap calculation wraps around the block of points), and the amount value increases by 1 up to 6.
One characteristic of this cipher that it shares with many other ciphers is its cyclic nature. In earlier tries at an algorithm, when the block size was the length of the text and no transposition step was integrated, specific plaintext lengths and dimension choices would result in characters that only flipped sign, no matter the number of cycles! These were "degenerate" cases when two indexes always paired up with one another; therefore, each gap parameter value was always paired up with the same axis parameter value (although the converse was usually not true--a specific axis parameter might or might not always have been paired up with the same gap parameter).
In a simple example, a message length (padded if necessary) of 32 and a dimension of 4 results in 32/4 = 8 "points" and consequently 8 possible gap values between indexes (0,4...24,28). The gap parameter value cycles through its 8 possibilities as the axis parameter value simultaneously cycles through its 4 possibilities (0,1,2,3), so now a gap of 4 always applies to an axis of 1, a gap of 8 always applies to an axis of 2, etc. The pattern is clear: a "degenerate" case corresponds to when the number of axes is a divisor of the number of possible gaps, for it is those circumstances when the cycle of axis parameter values maps nicely onto the cycle of gap parameter values. What we have here is an application of the fundamental theorem of cyclic groups (the description "maps nicely onto" is an informal way of stating that one cycle or cyclic group is a subgroup of another cycle or cyclic group).
The complication of avoiding these degenerate cases motivated me both to switch to a set block length and to a set collection of dimension choices, where the dimension choices are 1) divisors of the block length and are not 2) divisors of the associated quotient. Put another way, the block length is the least common multiple of the dimension and the block length divided by the dimension. In addition, I preferred smaller block lengths and dimension choices but I also preferred larger amounts of dimension choices. Check the handy Wikipedia table yourself if you like. As previously mentioned, I opted for a block length of 30. The recommended number of cycles, 210, is the least common multiple of the gap parameter possibilities, axis parameter possibilities, and amount parameter possibilities (7).
Key strength is a question I'm not sure how to answer. The key's "n", the number of cycles to run on the set of message points, isn't much of an obstacle to brute-force cracking. If x number of cycles hasn't yielded even a partial message, trying x + 1 cycles is no problem at all! The key's "d", the array of axis initializations, is more promising. Assuming 2000 possible values for an axis (-999 to 1000 inclusive), for a choice of 2 dimensions that would be a mere 2000^2 = 4,000,000. But a choice of 15 dimensions would be 2000^15 = 3.2768 * 10^49 according to my calculator, which is greater than 2^128. In any case, my limited testing indicated that the initialization value for a chosen axis affects a proper subset of message characters, not all--partial keys are useful, but to verify the correctness of the partial key one must have the number of cycles right.
My Javascript implementation of the Pointed cipher is part of the blog layout, down at the bottom on the right. In the page source, search for 'pointedcipher' (one word). If you use it or download it I am not responsible for any awful consequences. I don't advise you to use it for encrypting anything important or unimportant. Legal disclaimer, etc. It's purely for entertainment, educational, and/or edutainment purposes.
Monday, October 15, 2007
advanced playlists in amarok
Lately I've been making greater use of smart and dynamic playlists in amarok. I felt it would be criminally callous not to share my experiences and appreciation.
A dynamic playlist is really a configurable playback mode that automatically enqueues random tracks for playing and removes played tracks from view as each track ends. Each dynamic playlist uses one or more playlists, including smart playlists, as the source of tracks. (The built-in dynamic playlist "Random Mix" just uses the entire collection.) The user can add or remove or reorder tracks while the dynamic playlist is running, but the dynamic playlist will only add or remove tracks as it needs to, according to its settings. The aim of this feature is to keep the resource use of the player down, of course, and the visible list of tracks to a manageable level. It reminds me of the "stream" or "lazy sequence" concept--if one is only processing one or a small subset of members of the input at once, it's more efficient to sequentially retrieve or produce no more than that input portion for the algorithm, as it proceeds. In actuality, the input stream may be internally buffered for efficiency, though this doesn't matter to the consuming algorithm.
I'm more impressed by smart playlists, which aptly provide another demonstration of the value in storing audio metadata in a database. Smart playlists are defined by setting desired track characteristics. The smart playlist then automatically includes tracks with those characteristics. This is a straightforward idea for relieving users of the daunting alternative task: managing a custom playlist's contents through repetitive interface manipulations (for small playlists and collections, this is a chore, for larger playlists and collections, this is nigh-unworkable). What turns smart playlists from a helpful aid to a swoon-inducing beloved companion is the comprehensiveness of the create/edit window. The query-building capability is dreamy. Its power is intoxicating. A competing Ajax application (is there one?) would need to work hard to convince a user to stray.
The set of characteristics to query on includes the usual standbys, Title and Artist and Album and Genre. The additional options, that make one go week in the knees, are Length, Play Count, Year, Comment, First Play and Last Play date, Score, Rating, Mount Point, BPM, Bitrate, and Label. Labels, also known as tags in other contexts, are words associated with tracks in a many-to-many relationship. As usual, the value lies in the ability to create ad-hoc groups using the labels/tags. One smart playlist could contain all tracks with one label, another smart playlist could contain all tracks with another label, etc. This way of creating playlists by label may not seem superior to the technique of creating custom playlists. However, when a new track should be in five different custom playlists, it's easier to drag the five corresponding previously-used labels onto the new track and let the playlists do the grouping work.
To make the process of entering a filtered characteristic still more alluring, after choosing a particular characteristic from the drop-down list, a set of controls to the right of the list change to only offer the choices that make sense for that characteristic. For instance, a numerical characteristic leads to a drop-down list of connectives such as "is", "is not", "is greater than", "is less than", followed by an input box for numbers (and little increment/decrement arrows to change the value by clicking, if you swing that way). A textual characteristic switches the list of connectives to regexp-like choices such as "contains", "starts with", "does not start with", followed by a combobox whose assisting drop-down portion contains all the actual examples of the chosen characteristic in the collection. After choosing Genre, it might contain Reggae. This is more than "user-friendliness". This is admirable consideration in the partnership between interface and user. The fewer mistakes the user is likely to make, the less aggravation is involved in getting what he (or she) really really wants.
So the depth of each filtering characteristic reaches areas of need the user wasn't aware was there, but further enamoring is the possible breadth. On the far right of each characteristic are two buttons labeled + and -. Clicking + adds a new, identical row of controls for filtering based on an additional characteristic. As you might expect, clicking - on a row of controls removes it and its associated characteristic filter. A group of any number of filtering characteristics, that can grow or shrink on command (I've never craved more than three), is fine, but it lacks some flexibility: this assumes that all the filters must be satisfied simultaneously. What about when a user merely wants one or more of the filters to be satisfied by any one track? The filters the user wants to treat this way are separately grouped on screen under a header that starts with the words "Match Any" (the other grouping's header starts with the words "Match All").
Underneath, the window has some options for organizing the resulting list of tracks. The user may order the tracks according to a particular characteristic in ascending or descending order. At the time the playlist is loaded by the player, the tracks can be resorted easily, and a dynamic playlist based on the smart playlist would play tracks randomly in any case, so the order option isn't of much interest apart from its use in conjunction with the limit option. In the same way, using the limit option to keep a playlist small is not necessary if playback proceeds in a dynamic playlist. But put together, the order and limit options could enable the creation of a playlist of the "twenty top-rated tracks less than two minutes long", for instance. (Ratings and scores are different things in amarok, but that's a whole 'nother blog entry.) The "expand" option selects a characteristic to automatically group tracks into subsidiary playlists, in each of which all the tracks have that same characteristic in common. In the playlist choosing pane, the subsidiary playlists appear as "tree children" of the smart playlist--expanding the smart playlist like a folder shows all of its subsidiary playlists indented underneath it as separate playlists. Choosing an expand option is probably not an appreciable gain for most smart playlists, but for huge lists it's valuable.
Did you know that clicking the middle button (or possibly the scroll wheel, depending on the mouse) on the amarok tray icon pauses and resumes? If the devil's in the details, then amarok is my Dark Lord.
A dynamic playlist is really a configurable playback mode that automatically enqueues random tracks for playing and removes played tracks from view as each track ends. Each dynamic playlist uses one or more playlists, including smart playlists, as the source of tracks. (The built-in dynamic playlist "Random Mix" just uses the entire collection.) The user can add or remove or reorder tracks while the dynamic playlist is running, but the dynamic playlist will only add or remove tracks as it needs to, according to its settings. The aim of this feature is to keep the resource use of the player down, of course, and the visible list of tracks to a manageable level. It reminds me of the "stream" or "lazy sequence" concept--if one is only processing one or a small subset of members of the input at once, it's more efficient to sequentially retrieve or produce no more than that input portion for the algorithm, as it proceeds. In actuality, the input stream may be internally buffered for efficiency, though this doesn't matter to the consuming algorithm.
I'm more impressed by smart playlists, which aptly provide another demonstration of the value in storing audio metadata in a database. Smart playlists are defined by setting desired track characteristics. The smart playlist then automatically includes tracks with those characteristics. This is a straightforward idea for relieving users of the daunting alternative task: managing a custom playlist's contents through repetitive interface manipulations (for small playlists and collections, this is a chore, for larger playlists and collections, this is nigh-unworkable). What turns smart playlists from a helpful aid to a swoon-inducing beloved companion is the comprehensiveness of the create/edit window. The query-building capability is dreamy. Its power is intoxicating. A competing Ajax application (is there one?) would need to work hard to convince a user to stray.
The set of characteristics to query on includes the usual standbys, Title and Artist and Album and Genre. The additional options, that make one go week in the knees, are Length, Play Count, Year, Comment, First Play and Last Play date, Score, Rating, Mount Point, BPM, Bitrate, and Label. Labels, also known as tags in other contexts, are words associated with tracks in a many-to-many relationship. As usual, the value lies in the ability to create ad-hoc groups using the labels/tags. One smart playlist could contain all tracks with one label, another smart playlist could contain all tracks with another label, etc. This way of creating playlists by label may not seem superior to the technique of creating custom playlists. However, when a new track should be in five different custom playlists, it's easier to drag the five corresponding previously-used labels onto the new track and let the playlists do the grouping work.
To make the process of entering a filtered characteristic still more alluring, after choosing a particular characteristic from the drop-down list, a set of controls to the right of the list change to only offer the choices that make sense for that characteristic. For instance, a numerical characteristic leads to a drop-down list of connectives such as "is", "is not", "is greater than", "is less than", followed by an input box for numbers (and little increment/decrement arrows to change the value by clicking, if you swing that way). A textual characteristic switches the list of connectives to regexp-like choices such as "contains", "starts with", "does not start with", followed by a combobox whose assisting drop-down portion contains all the actual examples of the chosen characteristic in the collection. After choosing Genre, it might contain Reggae. This is more than "user-friendliness". This is admirable consideration in the partnership between interface and user. The fewer mistakes the user is likely to make, the less aggravation is involved in getting what he (or she) really really wants.
So the depth of each filtering characteristic reaches areas of need the user wasn't aware was there, but further enamoring is the possible breadth. On the far right of each characteristic are two buttons labeled + and -. Clicking + adds a new, identical row of controls for filtering based on an additional characteristic. As you might expect, clicking - on a row of controls removes it and its associated characteristic filter. A group of any number of filtering characteristics, that can grow or shrink on command (I've never craved more than three), is fine, but it lacks some flexibility: this assumes that all the filters must be satisfied simultaneously. What about when a user merely wants one or more of the filters to be satisfied by any one track? The filters the user wants to treat this way are separately grouped on screen under a header that starts with the words "Match Any" (the other grouping's header starts with the words "Match All").
Underneath, the window has some options for organizing the resulting list of tracks. The user may order the tracks according to a particular characteristic in ascending or descending order. At the time the playlist is loaded by the player, the tracks can be resorted easily, and a dynamic playlist based on the smart playlist would play tracks randomly in any case, so the order option isn't of much interest apart from its use in conjunction with the limit option. In the same way, using the limit option to keep a playlist small is not necessary if playback proceeds in a dynamic playlist. But put together, the order and limit options could enable the creation of a playlist of the "twenty top-rated tracks less than two minutes long", for instance. (Ratings and scores are different things in amarok, but that's a whole 'nother blog entry.) The "expand" option selects a characteristic to automatically group tracks into subsidiary playlists, in each of which all the tracks have that same characteristic in common. In the playlist choosing pane, the subsidiary playlists appear as "tree children" of the smart playlist--expanding the smart playlist like a folder shows all of its subsidiary playlists indented underneath it as separate playlists. Choosing an expand option is probably not an appreciable gain for most smart playlists, but for huge lists it's valuable.
Did you know that clicking the middle button (or possibly the scroll wheel, depending on the mouse) on the amarok tray icon pauses and resumes? If the devil's in the details, then amarok is my Dark Lord.
Saturday, August 04, 2007
callbacks within callbacks
Oh, the travails of an Ajax amateur...in the beginning the requirement was straightforward. Grab some data asynchronously by passing along an "update-display" callback function (a function with a 'data' parameter). The callback function could even be anonymous, or refer to variables in enclosing function scopes at the time of definition--nothing out of the ordinary.
Then someone has the nerve to change his mind about the interface, so now one display needs to show what is actually the patched-together result of several invocations of the same data call, all at once. At first glance, this is still easy--just fire off all the asynchronous requests required and fuhgeddaboutit, because each one has a callback to update only its part of the display.
Wrong! In this case, the parts of the display are ordered and the necessary extra data calls to fill in "missing" points aren't known until some of the data has been gathered, meaning the part of the display generated from call C must be finished after the part of the display generated from call B is finished, and so on. But all the calls are asynchronous, so there is no guarantee whatsoever of the time or order in which individual calls will return.
Flashbacks of semaphores and mutexes and Dining Philosophers dance before your eyes. No--that way lies madness. Since the display is ordered, better to just get all the data pieces, assemble them, then display the whole as if it had been from a single data call. Unfortunately, the "update-display" function can't immediately run after firing off the asynchronous calls, of course, because then it might run before all the data was gathered! So the "update-display" function must be in a callback of some kind. Yet if the calls are asynchronous, which callback should contain the overall "update-display" function? The last one to finish, of course. But which one will that be?
Puzzle it till your puzzler is sore, and one answer, not necessarily the best, may emerge. The time spent studying the techniques of functional programming may generate (yield?) a payoff. Consider the following almost-real code, in which a callback function takes its own function parameter.
The idea is that addAnotherAfterFunction repeatedly operates, as needed, on the function in the afterFunction variable. One callback becomes embedded inside another callback inside another callback. When the outermost callback (the final contents of the afterFunction variable) runs, it makes a data call and passes in the next-inner callback. The data call runs, then executes the next-inner callback that was passed to it. The next-inner callback perhaps makes its own data call...and so on, until the "true original" callback, the "update-display" function, takes the fully-constructed data and does what it will.
One downside is that the invocations of the data calls can't run in parallel (although since order explicitly matters so much in this situation, parallel execution won't work anyway). Also, tossing around so many functions with their own individual closed-over scopes surely eats up some memory (although I wonder if some "copy-on-write" implementation could be a workaround). Honestly, I'm not sure what terminology describes this technique. Um, mumble, mumble, continuation-passing-style, mumble, combinator, mumble, monadic bind, er...
Then someone has the nerve to change his mind about the interface, so now one display needs to show what is actually the patched-together result of several invocations of the same data call, all at once. At first glance, this is still easy--just fire off all the asynchronous requests required and fuhgeddaboutit, because each one has a callback to update only its part of the display.
Wrong! In this case, the parts of the display are ordered and the necessary extra data calls to fill in "missing" points aren't known until some of the data has been gathered, meaning the part of the display generated from call C must be finished after the part of the display generated from call B is finished, and so on. But all the calls are asynchronous, so there is no guarantee whatsoever of the time or order in which individual calls will return.
Flashbacks of semaphores and mutexes and Dining Philosophers dance before your eyes. No--that way lies madness. Since the display is ordered, better to just get all the data pieces, assemble them, then display the whole as if it had been from a single data call. Unfortunately, the "update-display" function can't immediately run after firing off the asynchronous calls, of course, because then it might run before all the data was gathered! So the "update-display" function must be in a callback of some kind. Yet if the calls are asynchronous, which callback should contain the overall "update-display" function? The last one to finish, of course. But which one will that be?
Puzzle it till your puzzler is sore, and one answer, not necessarily the best, may emerge. The time spent studying the techniques of functional programming may generate (yield?) a payoff. Consider the following almost-real code, in which a callback function takes its own function parameter.
function baseCallback(data,update-display-function) {
var afterFunction = function(dataparm) {
update-display-function(dataparm); };
function addAnotherAfterFunction(currentAfterFunction,
patching-info) {
return function(dataparm) {
var innerCallback = function(innerData) {
// process data...
// patch data using patching-info
// (data patch location passed by reference)
currentAfterFunction(dataparm);
}
dataCallInvoke(params-computed-from-patching-info,
innerCallback);
}
}
// process data in a loop, and within the loop,
// when there's a need to make another data call...
afterFunction =
addAnotherAfterFunction(afterFunction,
data-patching-information);
// after the data processing loop
afterFunction(data);
}
function dataCallInvoke(originalParams,upd-disp-fn) {
var normalCallback = function(dataparam) {
baseCallback(dataparam,upd-disp-fn);
}
callWithCallback(originalParams,normalCallback);
}
dataCallInvoke(origArguments,overall-display-func);
The idea is that addAnotherAfterFunction repeatedly operates, as needed, on the function in the afterFunction variable. One callback becomes embedded inside another callback inside another callback. When the outermost callback (the final contents of the afterFunction variable) runs, it makes a data call and passes in the next-inner callback. The data call runs, then executes the next-inner callback that was passed to it. The next-inner callback perhaps makes its own data call...and so on, until the "true original" callback, the "update-display" function, takes the fully-constructed data and does what it will.
One downside is that the invocations of the data calls can't run in parallel (although since order explicitly matters so much in this situation, parallel execution won't work anyway). Also, tossing around so many functions with their own individual closed-over scopes surely eats up some memory (although I wonder if some "copy-on-write" implementation could be a workaround). Honestly, I'm not sure what terminology describes this technique. Um, mumble, mumble, continuation-passing-style, mumble, combinator, mumble, monadic bind, er...
Saturday, April 21, 2007
making widgets from data in javascript
Here is a convoluted, extra-long tale of real-world software evolution. I'm one of those odd programmers who doesn't mind writing documentation because he likes to expound on his own code in fits of self-aggrandizement. I was working on (rewriting an existing app as) an Ajax application when I soon realized that the task of generating a widget from data was generally applicable to many parts of the interface. I imagine many others have thought the same. I created a function with these parameters: a parent node of the widget-to-be, a 'mapping object', a 'data object', and an optional 'context object' (the context object parameter wasn't in the initial design). The function interpreted the properties of the mapping object to create a widget out of the properties of the data object, and then it added the widget as a child of the passed parent node.
The mapping object had a reserved set of "special" properties with specific meanings to the widget-maker function, like 'widgetType' to specify which widget to create or 'nodeposition' to override the default append so the widget could be inserted at any point in the parent node's existing children. Any properties of the mapping object other than these "special" properties became properties of the widget. If the mapping property's value was a string, the widget-maker looked up that string as a property name of the data object, and then set a widget property with the same name as the mapping property to the property value of the data object - the mapping object's property served as a connecting link or data lookup-index ("to assign this widget property, look at this data property"). I thought this would work fine, for a few minutes. Then I realized I would want to set a widget caption to, say, data property A concatenated with data property B. So I set up an alternate behavior for mapping object properties. If the value was a function rather than a string (checked for using typeof), then the widget-maker evaluated the function with the data object as the parameter, and the return value became the property value on the widget. I abstracted the string-vs-function behavior into a separate function that let me write the equivalent of "I don't care if the mapping object property is a string or function, just do what you need to do for this mapping object property and give me a result to assign".
Was that enough? Not nearly. Even the best of widgets may not be complete as is. What if I had to add more text to the widget, or stick an icon on it, etc.? Generally speaking, a widget may need to have children of its own. I don't mean child widgets, which I'll explain the solution for later, but child content. I expanded the widget-maker to interpret a new special property, childNodes, which was a mixed array of strings or functions. Each element would be evaluated and then concatenated into the widget's innerHTML. (I started out by adding a text node for each element, but discovered that prevented me from having markup in the strings - the markup would actually become literal text in the text nodes, not HTML tags.) In practice, I often ended up just writing one big function, which meant childNodes was a one-element array, and that one element was a function! Eh, hindsight.
In addition to childNodes, some other noteworthy "complex" properties that I added were a domAttrs property for specifying a series of properties to apply to the widget's primary DOM node (not used much except for specifying a data-dependent node ID), a styles property for specifying style rules to apply, and a postCreation function for miscellaneous widget-related code that had to be run immediately after widget creation. The postCreation function might connect up some callbacks, for instance (this couldn't happen until after the widget was created). The callbacks created in the postCreation function could use any variables within the postCreation function, although by the time the callback ran the postCreation function would of course be long finished - closures! At first the postCreation function received the data object and the primary DOM node of the widget. Later there were two more parameters: a context object and the widget itself. The postCreation function had to be passed the widget because I found cases in which treating a widget as a child of a DOM node was incorrect - for everything to work properly, the widget had to be passed to another widget. To defeat the widget-maker's default behavior for those cases, I added yet another special property to the mapping object, manualAppend.
If making one widget from one data object is a common operation, then so is making one widget for each member object of an array. I made another function, an array-to-widgets function, that called the widget-making function as it walked the array. And if making an array of widgets from a data array is a common operation, then so is making a tree of widgets! After assuming that the data array was preordered by the relevant groups (meaning if any two elements are in the same group or subgroup then those two elements are contiguous in the array), the array-to-widgets function gained the capability to create trees of widgets by determining where the group breaks were in the array and then creating the right widget for each group or subgroup (and the right widget for each individual array element as children of those). Besides the data array and the parent DOM node, the parameters then included a list of properties the data was grouped on (contiguous elements with the same value for that property were in the same group or subgroup so the algorithm only add to check for changes in each group's "running value"), in grouping order going from the property for the "highest" or most-inclusive group to the property for the "lowest" or least-inclusive group, and another list of mapping objects that contained one mapping object for each of the groups/subgroups and one last mapping object to always run on each individual data element. It's less complicated than it may sound, once you get it right. A hierarchical tree list of cities grouped by state/province and country, in which each element in the data array is a city with 'country' and 'state' properties and the cities are preordered/presorted so that cities in the same country are contiguous and cities in each state/province are contiguous, might be created by passing the two lists ['country','state'] and [countryTreeMapper,stateTreeMapper,cityLeafMapper]. Naturally, I usually called the array-to-widgets function, rather than calling the widget-making function directly. (And just in passing, any element in the mapper list can be null, which the widget-making function handles by merely returning a 'placeholder' empty HTML element).
Ideally, the widget-maker function would be sufficiently distinct from the array-to-widgets function for reuse purposes, and it was - sometimes the postCreation function for a mapping object would call the widget-maker (so at some level the widget-maker was calling itself). But chances are, a group widget will need to summarize information about its group - showing its member count as part of some childNodes content, maybe. To do this, the functions in the mapping object must know 1. the array it is being generated from, 2. the index of itself (the starting index of the group) in that array. Although I wondered if there was a better way, I stuck a 'context object' parameter on the end of the widget-maker function but made sure the widget-maker function could work without it (remember, javascript functions can take a variable number of arguments). The array-to-widgets function updated the context object as it iterated, and passed the context to the widget-maker function. The widget-maker function would pass the context on to the functions in the mapping object (which also could be written to not take the additional context parameter at all).
I still wasn't done. Creating gobs of widgets out of a data array was a happy accomplishment, but consider what happens when the data array changes ever-so-slightly to have one more element. Is it reasonable to demolish and recreate all those widgets to accommodate one new red-headed stepchild? (The answer is no.) I started by trying to make the array-to-widgets function perform a data/widget "diff", but I gave up on that quite quickly. Instead, I modified the widget-maker function to first compute what the ID of the widget would be, check if that ID was taken, and only create the widget if the ID was available. Since IDs must be globally unique anyway, the procedure seemed reasonable. As for updates, if the widget-maker function found that the ID was taken, it reran the mapping object functions after setting a property on the context object to signal that the function was called for updating not creating. Otherwise, a new total might be appended on to the end of the old total instead of replacing it ("1011" instead of changing "10" to "11")!
For handling removals, I took a similar tack. The array-to-widgets function now kept track of the IDs of the widgets it created while iterating, and returned the list of created IDs as properties of an "old ID" object. It also took a new (optional) parameter, the object returned from the previous time the array-to-widgets function ran. If the old-ID-object was present, then after the array-to-widgets function was done iterating it checked to see if each of the properties of the old-ID-object were properties of the ID-tracking object it just created (the object it created for its return value anyway). Any IDs that were property names of the old ID object but not property names of the just-created ID object represented obsolete or outdated widgets, so the array-to-widgets function destroyed them (using the proper API so nothing was left dangling).
The refinement of my functions happened gradually as I better understood the problems the functions had to solve. Writing widget-creation code for any given chunk of data would be simpler or just more straightforward code, but I would then be forced to rewrite variations of that in perpetuity. I prefer a solution that makes use of javascript's no-fuss lists (javascript's "arrays"), hashes (javascript's "objects"), and anonymous functions/closures to let me compose program elements at runtime.
Bonus observation: It would be criminal for me to not give credit to Firebug. Apart from its impressive CSS features (dynamically adjusting style rules and immediately seeing the result is a favorite of mine), its javascript debugging is better than sliced bread. Literally. I would forgo sliced bread, chopping loaves with a cleaver, rather than forgo Firebug javascript debugging. Setting breakpoints in javascript code, even in anonymous functions, is my hero. The network activity tracking is also exceedingly convenient for finding out exactly what the asynchronous server request and reply looked like.
The mapping object had a reserved set of "special" properties with specific meanings to the widget-maker function, like 'widgetType' to specify which widget to create or 'nodeposition' to override the default append so the widget could be inserted at any point in the parent node's existing children. Any properties of the mapping object other than these "special" properties became properties of the widget. If the mapping property's value was a string, the widget-maker looked up that string as a property name of the data object, and then set a widget property with the same name as the mapping property to the property value of the data object - the mapping object's property served as a connecting link or data lookup-index ("to assign this widget property, look at this data property"). I thought this would work fine, for a few minutes. Then I realized I would want to set a widget caption to, say, data property A concatenated with data property B. So I set up an alternate behavior for mapping object properties. If the value was a function rather than a string (checked for using typeof), then the widget-maker evaluated the function with the data object as the parameter, and the return value became the property value on the widget. I abstracted the string-vs-function behavior into a separate function that let me write the equivalent of "I don't care if the mapping object property is a string or function, just do what you need to do for this mapping object property and give me a result to assign".
Was that enough? Not nearly. Even the best of widgets may not be complete as is. What if I had to add more text to the widget, or stick an icon on it, etc.? Generally speaking, a widget may need to have children of its own. I don't mean child widgets, which I'll explain the solution for later, but child content. I expanded the widget-maker to interpret a new special property, childNodes, which was a mixed array of strings or functions. Each element would be evaluated and then concatenated into the widget's innerHTML. (I started out by adding a text node for each element, but discovered that prevented me from having markup in the strings - the markup would actually become literal text in the text nodes, not HTML tags.) In practice, I often ended up just writing one big function, which meant childNodes was a one-element array, and that one element was a function! Eh, hindsight.
In addition to childNodes, some other noteworthy "complex" properties that I added were a domAttrs property for specifying a series of properties to apply to the widget's primary DOM node (not used much except for specifying a data-dependent node ID), a styles property for specifying style rules to apply, and a postCreation function for miscellaneous widget-related code that had to be run immediately after widget creation. The postCreation function might connect up some callbacks, for instance (this couldn't happen until after the widget was created). The callbacks created in the postCreation function could use any variables within the postCreation function, although by the time the callback ran the postCreation function would of course be long finished - closures! At first the postCreation function received the data object and the primary DOM node of the widget. Later there were two more parameters: a context object and the widget itself. The postCreation function had to be passed the widget because I found cases in which treating a widget as a child of a DOM node was incorrect - for everything to work properly, the widget had to be passed to another widget. To defeat the widget-maker's default behavior for those cases, I added yet another special property to the mapping object, manualAppend.
If making one widget from one data object is a common operation, then so is making one widget for each member object of an array. I made another function, an array-to-widgets function, that called the widget-making function as it walked the array. And if making an array of widgets from a data array is a common operation, then so is making a tree of widgets! After assuming that the data array was preordered by the relevant groups (meaning if any two elements are in the same group or subgroup then those two elements are contiguous in the array), the array-to-widgets function gained the capability to create trees of widgets by determining where the group breaks were in the array and then creating the right widget for each group or subgroup (and the right widget for each individual array element as children of those). Besides the data array and the parent DOM node, the parameters then included a list of properties the data was grouped on (contiguous elements with the same value for that property were in the same group or subgroup so the algorithm only add to check for changes in each group's "running value"), in grouping order going from the property for the "highest" or most-inclusive group to the property for the "lowest" or least-inclusive group, and another list of mapping objects that contained one mapping object for each of the groups/subgroups and one last mapping object to always run on each individual data element. It's less complicated than it may sound, once you get it right. A hierarchical tree list of cities grouped by state/province and country, in which each element in the data array is a city with 'country' and 'state' properties and the cities are preordered/presorted so that cities in the same country are contiguous and cities in each state/province are contiguous, might be created by passing the two lists ['country','state'] and [countryTreeMapper,stateTreeMapper,cityLeafMapper]. Naturally, I usually called the array-to-widgets function, rather than calling the widget-making function directly. (And just in passing, any element in the mapper list can be null, which the widget-making function handles by merely returning a 'placeholder' empty HTML element).
Ideally, the widget-maker function would be sufficiently distinct from the array-to-widgets function for reuse purposes, and it was - sometimes the postCreation function for a mapping object would call the widget-maker (so at some level the widget-maker was calling itself). But chances are, a group widget will need to summarize information about its group - showing its member count as part of some childNodes content, maybe. To do this, the functions in the mapping object must know 1. the array it is being generated from, 2. the index of itself (the starting index of the group) in that array. Although I wondered if there was a better way, I stuck a 'context object' parameter on the end of the widget-maker function but made sure the widget-maker function could work without it (remember, javascript functions can take a variable number of arguments). The array-to-widgets function updated the context object as it iterated, and passed the context to the widget-maker function. The widget-maker function would pass the context on to the functions in the mapping object (which also could be written to not take the additional context parameter at all).
I still wasn't done. Creating gobs of widgets out of a data array was a happy accomplishment, but consider what happens when the data array changes ever-so-slightly to have one more element. Is it reasonable to demolish and recreate all those widgets to accommodate one new red-headed stepchild? (The answer is no.) I started by trying to make the array-to-widgets function perform a data/widget "diff", but I gave up on that quite quickly. Instead, I modified the widget-maker function to first compute what the ID of the widget would be, check if that ID was taken, and only create the widget if the ID was available. Since IDs must be globally unique anyway, the procedure seemed reasonable. As for updates, if the widget-maker function found that the ID was taken, it reran the mapping object functions after setting a property on the context object to signal that the function was called for updating not creating. Otherwise, a new total might be appended on to the end of the old total instead of replacing it ("1011" instead of changing "10" to "11")!
For handling removals, I took a similar tack. The array-to-widgets function now kept track of the IDs of the widgets it created while iterating, and returned the list of created IDs as properties of an "old ID" object. It also took a new (optional) parameter, the object returned from the previous time the array-to-widgets function ran. If the old-ID-object was present, then after the array-to-widgets function was done iterating it checked to see if each of the properties of the old-ID-object were properties of the ID-tracking object it just created (the object it created for its return value anyway). Any IDs that were property names of the old ID object but not property names of the just-created ID object represented obsolete or outdated widgets, so the array-to-widgets function destroyed them (using the proper API so nothing was left dangling).
The refinement of my functions happened gradually as I better understood the problems the functions had to solve. Writing widget-creation code for any given chunk of data would be simpler or just more straightforward code, but I would then be forced to rewrite variations of that in perpetuity. I prefer a solution that makes use of javascript's no-fuss lists (javascript's "arrays"), hashes (javascript's "objects"), and anonymous functions/closures to let me compose program elements at runtime.
Bonus observation: It would be criminal for me to not give credit to Firebug. Apart from its impressive CSS features (dynamically adjusting style rules and immediately seeing the result is a favorite of mine), its javascript debugging is better than sliced bread. Literally. I would forgo sliced bread, chopping loaves with a cleaver, rather than forgo Firebug javascript debugging. Setting breakpoints in javascript code, even in anonymous functions, is my hero. The network activity tracking is also exceedingly convenient for finding out exactly what the asynchronous server request and reply looked like.
Monday, April 02, 2007
another Halting Problem description
I don't particularly like any of the descriptions of the Halting Problem that I found lying around the Web, so here goes. Assume there is a program, halt or H for short, that solves the halting problem, i.e., whether any program will halt on a hypothetical input. H's input therefore has two components: 1) any program that can run on a Turing Machine (treated or encoded as data), and 2) hypothetical input for the passed program. For Turing Machines, program instructions and data reside on the same "tape", so it's not that incredible for program instructions to function as data or vice versa - for instance, a (Universal) Turing Machine implemented (finitely and imperfectly) in hardware can simulate an incredible number of Turing Machines, so any program in software that can run on one of the simulated Turing Machines will run on the hardware (Universal) Turing Machine.
Now consider another program, NOT-halt or N for short, that builds on H. This is reasonable both because we started by assuming H's existence and because N's additional code is simple as can be: it does what H does to its input, then it performs the opposite of H's result. If H computes "halt", N will not halt. If H computes "no halt", N will halt. For any input x, N(x) does the opposite of whatever H(x) is.
Notwithstanding how rebellious it acts, N is a program like any other, which implies H can determine if N halts or not for any input. So H(y), where y is an input whose program component is N, must return either "halt" or "no halt". Above we established that for any x N(x) does the opposite of whatever H(x) reports, so certainly in the specific case of x=y, N(y) does the opposite of whatever H(y) reports. If H(y) computes a halt, then N(y) does not halt. If H(y) computes a non-halt, then N(y) does halt. But the program component of y is N! If N(y) halts, H(y) should compute halt, not its opposite! And if N(y) does not halt, H(y) should compute non-halt, not its opposite! For program N (input y), H does not work, therefore H does not work for all programs. The original assumption that H works for all programs has met with a contradiction, so there can be no H. The conclusion is that there is no fully general program for computing the halting problem. I hope readers with minds similar to mine, for whom some other Halting Problem descriptions feel unsatisfactory, will find this helpful.
Now consider another program, NOT-halt or N for short, that builds on H. This is reasonable both because we started by assuming H's existence and because N's additional code is simple as can be: it does what H does to its input, then it performs the opposite of H's result. If H computes "halt", N will not halt. If H computes "no halt", N will halt. For any input x, N(x) does the opposite of whatever H(x) is.
Notwithstanding how rebellious it acts, N is a program like any other, which implies H can determine if N halts or not for any input. So H(y), where y is an input whose program component is N, must return either "halt" or "no halt". Above we established that for any x N(x) does the opposite of whatever H(x) reports, so certainly in the specific case of x=y, N(y) does the opposite of whatever H(y) reports. If H(y) computes a halt, then N(y) does not halt. If H(y) computes a non-halt, then N(y) does halt. But the program component of y is N! If N(y) halts, H(y) should compute halt, not its opposite! And if N(y) does not halt, H(y) should compute non-halt, not its opposite! For program N (input y), H does not work, therefore H does not work for all programs. The original assumption that H works for all programs has met with a contradiction, so there can be no H. The conclusion is that there is no fully general program for computing the halting problem. I hope readers with minds similar to mine, for whom some other Halting Problem descriptions feel unsatisfactory, will find this helpful.
Monday, December 18, 2006
relational database as inference engine
A short while ago, I was reworking an SQL query to ensure that it would yield the precise results the rest of the application expected. (Side observation: my conscience kept hollering at me to make the SQL simpler and just crunch the returned data as needed. Shut up, Jiminy! Go spell encyclopedia and leave me alone.) I noticed that I was adding, taking away, and rearranging WHERE conditions to match particular rows. But matching to specific data based on a generalized set of characteristics is also an activity performed by inference engines! It's not that far of a leap to equate database rows to facts, SQL queries to rules, and newly-inserted database rows to concluded facts or assertions. I wouldn't be surprised if the academics in these two camps have been cross-pollinating ideas for some time. If I knew substantially more than jack about real relational database theory, I could offer some insightful comments. Instead, here's an example.
Imagine a set of widgets available for sale. A white widget has no additional cost, a black widget costs an additional 400, and a green widget costs an additional 200. The base cost for round widgets are 100, square widgets are 200, and triangular widgets are 150. How much does a round, black widget cost? An inference engine might have the set of rules (color white) => add 0, (color black) => add 400, (color green) => add 200, (shape round) => set 100, (shape square) => set 200, (shape triangle) => set 150. Then one could add the facts (color black) and (shape round) and have the answer. Or at least the necessary addends.
A (SQL-compatible) database could do a similar operation. One particular combination (or should I say tuple?) of facts becomes a row in table "facts". The rules become: (select 0 from facts where color = "white"), (select 400 from facts where color = "black"), (select 200 from facts where color = "green"), (select 100 from facts where shape = "round"), (select 200 from facts where shape = "square"), (select 150 from facts where shape = "triangle"). Add a row to "facts" that has a "color" column of "black" and a "shape" column of "round", run the queries, and there's your answer. Or at least the necessary addends.
The similarities become more striking when you use a database table in the usual way, i.e., using a row to represent one member out of a collection of similar entities. To refer back to the previous example, this just means renaming the "facts" table to "widgets" and breaking non-widget facts out into separate tables (the usual normalization). SQL queries that match multiple entities at once will employ JOINs. In the inference engine, an entity would look like (widget (shape square) (color green)), and I assume that rules that match multiple entities would work about the same as rules that match multiple individual facts.
As to whether it makes practical sense to map a relational database to an inference engine or vice versa, I'm inclined to think not. If your problem domain is rigid enough to work in a database, then there's no gain in using an inference engine instead. If your problem domain is a classic example of AI, a rather fuzzy attempt to capture a variety of heuristics working on loosely-structured data, then the database details would bog you down. To say nothing of the limitations (and proper place) of SQL.
Imagine a set of widgets available for sale. A white widget has no additional cost, a black widget costs an additional 400, and a green widget costs an additional 200. The base cost for round widgets are 100, square widgets are 200, and triangular widgets are 150. How much does a round, black widget cost? An inference engine might have the set of rules (color white) => add 0, (color black) => add 400, (color green) => add 200, (shape round) => set 100, (shape square) => set 200, (shape triangle) => set 150. Then one could add the facts (color black) and (shape round) and have the answer. Or at least the necessary addends.
A (SQL-compatible) database could do a similar operation. One particular combination (or should I say tuple?) of facts becomes a row in table "facts". The rules become: (select 0 from facts where color = "white"), (select 400 from facts where color = "black"), (select 200 from facts where color = "green"), (select 100 from facts where shape = "round"), (select 200 from facts where shape = "square"), (select 150 from facts where shape = "triangle"). Add a row to "facts" that has a "color" column of "black" and a "shape" column of "round", run the queries, and there's your answer. Or at least the necessary addends.
The similarities become more striking when you use a database table in the usual way, i.e., using a row to represent one member out of a collection of similar entities. To refer back to the previous example, this just means renaming the "facts" table to "widgets" and breaking non-widget facts out into separate tables (the usual normalization). SQL queries that match multiple entities at once will employ JOINs. In the inference engine, an entity would look like (widget (shape square) (color green)), and I assume that rules that match multiple entities would work about the same as rules that match multiple individual facts.
As to whether it makes practical sense to map a relational database to an inference engine or vice versa, I'm inclined to think not. If your problem domain is rigid enough to work in a database, then there's no gain in using an inference engine instead. If your problem domain is a classic example of AI, a rather fuzzy attempt to capture a variety of heuristics working on loosely-structured data, then the database details would bog you down. To say nothing of the limitations (and proper place) of SQL.
Sunday, October 01, 2006
prisoner's dilemma for P2P
Surely I can't be the first person to notice this, but the Prisoner's Dilemma has some applications to P2P. If you want a more thorough treatment of the Dilemma, look elsewhere. Here's the shorthand. There are two independent entities called A and B (or Alice and Bob, if you like). Each has the same choice between two possibilities that are generally called cooperate and defect. A and B don't know what the other will choose. The interesting part comes in the payoffs or profit margin of the four possible outcomes:
The Prisoner's Dilemma applies to P2P transactions in a limited way, if you consider the payoff to be the difference between downloads (positive) and uploads (negative). Assume A and B are two peers who wish to download something of equal value, normalized to be 1, that the other has (and everything's legal, yahda yahda yahda).
Nevertheless, the Prisoner's Dilemma shows that, assuming uploading or sharing has a cost, "leeches" are merely acting in a rational manner to maximize their individual payoff. So P2P systems that make it more rational, or even mandatory, to share stand a better chance of thriving. Specifically, if a peer downloads more than one item, the iterated or repeating Prisoner's Dilemma can come into play. A peer that acted as a leech for the last item could have some sort of penalty applied for the next item the peer downloads. Through this simple mechanism, known as Tit-for-Tat, cooperation becomes more attractive. Call it filesharing karma, only with a succession of files instead of lives. Any peer downloading its very last item would rationally defect on that item, because there would be no chance to receive a future penalty.
- If A cooperates and B cooperates, A and B each receive medium payoffs.
- If A cooperates and B defects, A receives a horribly low payoff and B receives a huge payoff.
- Similarly, if B cooperates and A defects, B receives a horribly low payoff and A receives a huge payoff.
- If A and B defect, A and B each receive low (but not horribly low) payoffs.
The Prisoner's Dilemma applies to P2P transactions in a limited way, if you consider the payoff to be the difference between downloads (positive) and uploads (negative). Assume A and B are two peers who wish to download something of equal value, normalized to be 1, that the other has (and everything's legal, yahda yahda yahda).
- If A uploads what B wants and B uploads what A wants, A and B both upload (-1) and download (1), so the payoff for each is 0.
- If A uploads what B wants and B does not upload, then A ends up with a payoff of -1, and B ends up with a payoff of 1.
- If A does not upload and B uploads what A wants, then A ends up with a payoff of 1, and B ends up with a payoff -1.
- If A does not upload and B does not upload, then no transaction occurs at all, so the payoff is 0.
- A and B both have what they wanted, in addition to what they uploaded, so each have a payoff of 1.
- A did not get what it wanted, but B did, so A's payoff is 0 and B's payoff is 1.
- A got what it wanted, but B did not, so A's payoff is 1 and B's payoff is 0.
- A and B both got nothing, a payoff of 0.
Nevertheless, the Prisoner's Dilemma shows that, assuming uploading or sharing has a cost, "leeches" are merely acting in a rational manner to maximize their individual payoff. So P2P systems that make it more rational, or even mandatory, to share stand a better chance of thriving. Specifically, if a peer downloads more than one item, the iterated or repeating Prisoner's Dilemma can come into play. A peer that acted as a leech for the last item could have some sort of penalty applied for the next item the peer downloads. Through this simple mechanism, known as Tit-for-Tat, cooperation becomes more attractive. Call it filesharing karma, only with a succession of files instead of lives. Any peer downloading its very last item would rationally defect on that item, because there would be no chance to receive a future penalty.
Monday, September 25, 2006
you are in a twisty maze of little bookmark folders
My problem is well-known. Like anyone else who spends a generous chunk of time following hyperlinks across the Web, I have a long yet ever-growing list of bookmarks in Firefox. To make sense out of the list, I keep the bookmarks in a menagerie of folders and folders-within-folders. It's easier to find a bookmark this way, but now I must potentially go through multiple folder levels before arriving at the actual bookmark. Moreover, I regularly visit only a handful or two of the bookmarks. I want to keep my organization scheme but also keep bookmark access convenient, especially for the most-visited bookmarks.
There's some great options available. The bookmark sidebar, opened and closed with Ctrl-B, has a search input that searches as you type. Any bookmark has an assignable Keyword property that will take you to that bookmark when you enter it into the address field. Either technique works well if you know exactly what bookmark you want to access. What I wanted was a way to group all of my most-visited sites in one folder, preferably automatically, then just use that folder for my common browsing. Desktops already provide something like this in startup menus for frequently-used programs. Interestingly, the general idea of "virtual bookmark folders" was in the planning for Firefox 2, but postponed. A way to accomplish this in the mean time, as well as affording ultimate (manual) control, would be to open up all of your commonly visited sites in a set of tabs, then Bookmark All Tabs.
OK, next I turned to the legion Firefox extensions. I tried Sort Bookmarks first. It lived up to its name. I'll keep this extension around. I liked that the bookmarks I cared for most were now at the top of each folder's list, but the problem of too many folders was still present. I started to go through the code for this extension to gauge how I could create my own extension for an automatically-updating "Frequently Visited Bookmarks" folder. Then I found the Autocomplete Manager extension. With this, bookmarks can appear in the address field's autocomplete list matching against the bookmark URL or title, and the bookmarks can be sorted most often visited first. Sure, it's not the most efficient search in the world, but it beats having to pop open the bookmarks sidebar or enter and memorize bookmark keywords. My remaining concern is that Firefox seems to have trouble sometimes keeping track of when I visit a particular address, resulting in low visit counts.
EDIT: It seems that restarting Firefox somehow updates the visit counts. I'd say to go figure, but I'm not going to; why should you? I have kept Firefox open for more than a week at a time, but starting it anew each day isn't that burdensome, I suppose.
There's some great options available. The bookmark sidebar, opened and closed with Ctrl-B, has a search input that searches as you type. Any bookmark has an assignable Keyword property that will take you to that bookmark when you enter it into the address field. Either technique works well if you know exactly what bookmark you want to access. What I wanted was a way to group all of my most-visited sites in one folder, preferably automatically, then just use that folder for my common browsing. Desktops already provide something like this in startup menus for frequently-used programs. Interestingly, the general idea of "virtual bookmark folders" was in the planning for Firefox 2, but postponed. A way to accomplish this in the mean time, as well as affording ultimate (manual) control, would be to open up all of your commonly visited sites in a set of tabs, then Bookmark All Tabs.
OK, next I turned to the legion Firefox extensions. I tried Sort Bookmarks first. It lived up to its name. I'll keep this extension around. I liked that the bookmarks I cared for most were now at the top of each folder's list, but the problem of too many folders was still present. I started to go through the code for this extension to gauge how I could create my own extension for an automatically-updating "Frequently Visited Bookmarks" folder. Then I found the Autocomplete Manager extension. With this, bookmarks can appear in the address field's autocomplete list matching against the bookmark URL or title, and the bookmarks can be sorted most often visited first. Sure, it's not the most efficient search in the world, but it beats having to pop open the bookmarks sidebar or enter and memorize bookmark keywords. My remaining concern is that Firefox seems to have trouble sometimes keeping track of when I visit a particular address, resulting in low visit counts.
EDIT: It seems that restarting Firefox somehow updates the visit counts. I'd say to go figure, but I'm not going to; why should you? I have kept Firefox open for more than a week at a time, but starting it anew each day isn't that burdensome, I suppose.
Thursday, July 27, 2006
the strain of a monad kick
I've been on a monak kick recently, i.e., reading some of the monad material lying around the Web in an attempt to wrap my gray matter around the concept. All about monads is good for its sheer thoroughness, and Monads as Containers makes a commendable effort to apply some analogies to it. This page about Haskell I/O takes a close, step-by-step look at why and how the IO monad works the way it does.
My own light bulb switched on (if it has!) when I realized that monads were like the formal symbol systems I learned in my abstract algebra courses (I was com sci major, math minor). Just as we can take a set of numbers, define some operations on that set which have specific properties, and then use those properties to combine operations, so we can do the same with a monad. Unlike the set of integers, which has numerical values and the (+) and (*) operations, a minimal monad has values of any specific type you choose and the "return" and "bind" operations or functions.
The return operation defines how to take a value that is not a monad and make a monad value out of it. The bind operation is trickier to understand, but basically it serves the purpose of constructing a new monad value from existing monad values. It takes for parameters a monad value and a function that creates a monad from an outside value, and returns a monad value by somehow applying the function argument to the monad argument. If the monad value was a simple list, for instance, the bind operation might just be applying the function to each element in the list and combining all the results into a list.
If I pass a monad value and a function to the bind operation, I will get back another monad value. (In abstract algebra we would say that the bind operation has closure.) This means that I can substitute a call to the bind operation anywhere that a monad value would work, such as, say, the monad value argument to a bind operation! I could say "monad bind function bind function bind function bind function". Each bind applies the function on the right to the monad on the left, so the monad on the left of each bind must be evaluated for the bind itself to be evaluated, so the sequence of the monad operations is assured.
The really interesting bit here is that the set of operators or functions for a monad can be put to use without knowing how the monad's bind or return are implemented. The Haskell folks have come up with many varieties of monads that can connect statements in ways that might make typical Haskell code sneer: IO, Continuations, exceptions, state. I haven't covered real examples of monads or transformers, and I don't plan to. I have satisfied my monad curiosity for right now.
My own light bulb switched on (if it has!) when I realized that monads were like the formal symbol systems I learned in my abstract algebra courses (I was com sci major, math minor). Just as we can take a set of numbers, define some operations on that set which have specific properties, and then use those properties to combine operations, so we can do the same with a monad. Unlike the set of integers, which has numerical values and the (+) and (*) operations, a minimal monad has values of any specific type you choose and the "return" and "bind" operations or functions.
The return operation defines how to take a value that is not a monad and make a monad value out of it. The bind operation is trickier to understand, but basically it serves the purpose of constructing a new monad value from existing monad values. It takes for parameters a monad value and a function that creates a monad from an outside value, and returns a monad value by somehow applying the function argument to the monad argument. If the monad value was a simple list, for instance, the bind operation might just be applying the function to each element in the list and combining all the results into a list.
If I pass a monad value and a function to the bind operation, I will get back another monad value. (In abstract algebra we would say that the bind operation has closure.) This means that I can substitute a call to the bind operation anywhere that a monad value would work, such as, say, the monad value argument to a bind operation! I could say "monad bind function bind function bind function bind function". Each bind applies the function on the right to the monad on the left, so the monad on the left of each bind must be evaluated for the bind itself to be evaluated, so the sequence of the monad operations is assured.
The really interesting bit here is that the set of operators or functions for a monad can be put to use without knowing how the monad's bind or return are implemented. The Haskell folks have come up with many varieties of monads that can connect statements in ways that might make typical Haskell code sneer: IO, Continuations, exceptions, state. I haven't covered real examples of monads or transformers, and I don't plan to. I have satisfied my monad curiosity for right now.
Thursday, July 13, 2006
check mysql logging config for knoppmyth
Yesterday I encountered some unusual problems in MythTV. None of my recorded programs were showing at all. I ssh'd over, found out that the / partition (not the /myth) was full, *grumble grumble*, and started picking off low-hanging fruit with apt-get remove until things started working again. I run KnoppMyth, which by default has a lot installed, for ease of use.
Today I ran a series of 'du --max-depth=1 -h ' commands, starting in /var and working my way down. In retrospect a 'find' of some variety probably would be a better choice. In any case, /var/log/mysql/mysql.log was the obvious culprit: it was dutifully recording every operation. After some quality time with Google I found out that a line in /etc/mysql/my.cnf had to be commented out: 'log = /var/log/mysql/mysql.log'. Self-evident, once you know about it. I can't even guess why that option was enabled, but thank sweet zombie jesus mythtv is the only client of this mysql instance and I am the only user. I thought I used KnoppMyth to avoid dealing with these kinds of issues? I guess I shouldn't complain, it's not as if I'm trying to run Myth on gentoo...
Today I ran a series of 'du --max-depth=1 -h ' commands, starting in /var and working my way down. In retrospect a 'find' of some variety probably would be a better choice. In any case, /var/log/mysql/mysql.log was the obvious culprit: it was dutifully recording every operation. After some quality time with Google I found out that a line in /etc/mysql/my.cnf had to be commented out: 'log = /var/log/mysql/mysql.log'. Self-evident, once you know about it. I can't even guess why that option was enabled, but thank sweet zombie jesus mythtv is the only client of this mysql instance and I am the only user. I thought I used KnoppMyth to avoid dealing with these kinds of issues? I guess I shouldn't complain, it's not as if I'm trying to run Myth on gentoo...
Subscribe to:
Posts (Atom)