Friday, July 18, 2008

the Woven cipher

Like the previous other two ciphers, the Woven cipher isn't great at encrypting, but nevertheless it was an engrossing hobby project that had a few fascinating wrinkles (is it any wonder I sometimes have trouble explaining to other people how I spend my time?). The basic idea behind this cipher is to convert all the characters of the plaintext into binary digits (bits), combine the bits of adjacent characters by alternately taking n bits from one then the other, and then iteratively mixing the resulting adjacent bit sequences in the same manner until only one remains. The key is the set of n values to use, which the algorithm cycles. Forming an intact whole out of little parts through an alternating process is analogous to weaving, hence the name "Woven". A second excellent name, perhaps more accurate but less appealing, is "the Multiplexed cipher", where the multiplexing is done using fixed data lengths, as opposed to fixed time periods for instance.

To work, the algorithm can't be quite that simple...
  • Most of the time, when the numbers don't match up exactly right, a source sequence won't have enough bits. When that happens, the strategy is to instead use an extra 0. Since the "weaving" happens from right to left, least-significant to most-significant, these are leading 0s so the numerical value of the original sequence is unchanged. Of course, the intermediate sequences aren't numbers so the leading 0s matter very much. And if the very last sequence, or each chunk (7 bits long?) that rather long sequence is broken into, must be interpreted as a fully-representative binary numeral, there would need to be an additional convention like always prepending an extra 1. Extra 0s are needed if the sequences are different lengths, like a space's 6 compared to a letter's 7, and/or the sequence isn't evenly divisible by n. These filler 0s cause the encrypted (i.e. transposed?) message to be longer than the original. Unorganized testing shows that the amount of expansion for short sentences is commonly 100% or more, probably an exponential-like effect of extra 0s requiring still more extra 0s in successive iterations.
  • Another extremely common complication is an odd quantity of sequences to be woven. The coping technique I eventually settled on was to spontaneously create a "null" or "empty" sequence of length 0 to pair up with the loner. Loner or not, the quantity of data sequences to be woven is half the previous quantity of sequences, rounded up. For instance, 10, 5, 3, 2, 1. Or 8, 4, 2, 1. The rounding--throwing away information-- implies that when decryption happens, the encrypted message and the key aren't enough to reconstruct the original. This is one reason why the third decryption input, the original number of characters, is necessary upfront (the other reason is to let the decryption algorithm know for certain when to quit unraveling sequences).
I wrote the encryption and decryption using generators, which fit a lazy, functional paradigm. As Wikipedia says, a generator "looks like a function but behaves like an iterator". I think of it as an "on-demand" stream of values, potentially infinite, that you only receive when and if you ask for 'em--that's why a generator is "lazy". A generator function, to fulfill the role of an iterator, must have enough state to remember its "position" in the "stream" in-between individual requests for values. A closure meets that requirement--by definition a closure retains its own copy of the variables from enclosing lexical function scopes at the time of closure definition/creation. A generator implemented with a closure is then a function created by still other functions, so in addition to being lazy it's "functional". By the way, creating generators with closures, whether or not the generator is referred to merely as an "iterator", is covered extensively in Higher-Order Perl (finish making the book available on-line, already!). My choice to use generators came from a few silly motivations: 1) the old easy-peasy procedure for decimal-to-binary conversion--repeatedly divide by two and use the remainders--happens to produce an answer bit by bit; 2) I like functional programming and attacking problems through recursion, when I can get it to work; 3) once I start to solve a problem one way I'm reluctant to abandon it for a different way (that would feel like surrendering to failure and wastefulness), especially when I don't have a deadline.

In the encryption and decryption, the role of each bit sequence is filled by a generator. This includes the special-case generators at the start and finish, described in the code as "Initial" or "Final". These generators are somewhat similar to the rest but perform additional conversion tasks and/or obtain data directly from arrays. As a result the program's high-level design is also functional: first it constructs one or more function(s) that will produce the answer (bit by bit) and then it repeatedly calls the function(s). By contrast, the imperative design would be to manipulate actual bit sequences, not generators, by using one set of sequences to create the next set of sequences, etc. The loops are virtually the same in the imperative and functional designs, but the functional equivalent has generators in place of bit sequences--each set of generators is the input for the next set of generators. And each generator (a closure) only knows about the generators that were explicitly placed into its scope at creation.

Encrypt generators merge the output of two encrypt generators: they have two parents and one child. Decrypt generators do the reverse, requesting solely "half" of the output from one source decrypt generator and then dividing it up so that it can return the "left" half to one destination decrypt generator and the "right" half to another destination decrypt generator: they have one (-half?) parent and two children. Encrypt generators act like multiplexers. Decrypt generators act like demultiplexers. The individual generators know and do very little:
  • When an encrypt generator is created, it's passed a 'left' source generator, a 'right' source generator, and the number of bits to read from a generator before switching to the other--this number is one of the digits from the key, so only digits 2-9 are admissible in the key. The encrypt generator keeps a running count of how many bits it has processed, whether it is currently reading from the left or right, and whether either of the generators has run out of bits. If both generators are empty (i.e., return '') then the generator returns '', the empty string. If the generator set to return the next bit isn't empty, then return that bit. If the generator set to return the next bit is empty but the other generator is not empty, return a '0' for filler. Encrypt generators take one 'op' parameter. If the op is 'peek' then the generator skips updating any variables (and it also passes 'peek' if it needs to ask a source generator for the next bit) , otherwise it increments the bit counter and then switches which generator to read if the counter modulo the key value is 0.
  • When a decrypt generator is created, it's passed one source generator, which "side" of bits the decrypt generator will always ask for from its source generator (left or right), and the number of bits to process before switching the side that should receive the next bit coming from the source generator--once again, this number is a key digit. Decrypt generators take one parameter, the "side" that is requesting its next bit. The decrypt generator keeps a running count of how many bits it has processed and whether the next bit it reads from the source generator should be returned as a 'left' bit or 'right' bit. It also keeps short queues for left bits and right bits. If it's asked for a bit for a side whose queue isn't empty, then it returns from the queue. Else, it checks which side should receive the source generator's next bit. If the side matches, get a bit and return it. If the side doesn't match, fill up the queue for the opposite side with bits from the source generator, then return the next bit that does apply to the requested side. Naturally, at any time the source generator sends a bit, update the bit counter and toggle the "side" variable if counter modulo n is 0. When the requested side is "all", the decrypt generator does nothing more than return the next bit it receives from its source generator (which is always just from one "side"). The "all" case is purely for the last stage of decryption, when no more "unraveling" is necessary.
The "direction" of iteration over the key's digits (to determine the n for each generator) alternates between sets of generators. For example, if for one set the index into the key starts at 0 and increases, then in the next set the index starts at the maximum and decreases. This behavior is so all the key values are more evenly used among the sets.

My Javascript implementation of the Woven cipher is part of the blog layout, down at the bottom on the right. In the page source, search for 'wovencipher' (one word). If you use it or download it I am not responsible for any awful consequences. I don't advise you to use it for encrypting anything important or unimportant. Legal disclaimer, etc. It's purely for entertainment, educational, and/or edutainment purposes.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.