Monday, June 23, 2008

the Pointed cipher

Sometimes a leisure-time project can take on a life of its own. In this case, it started with an inspiration I had. I lay much of the blame on my recent reading. What if someone used a cellular automaton (like Game of Life) to encrypt a message? Specifically, Alice would encrypt by encoding a message as an initial configuration of a cellular automaton (like Game of Life) and using the key to determine how board positions map to characters. My motivation was that a message "becoming alive and growing" into unexpected patterns would be quite hard to crack. Imagine if the information content of a message repeatedly folded over onto itself, thereby exhibiting strange chaotic order sensitive to initial conditions. Let a codebreaker try to analyze that.

The problem, as some readers have already guessed, is that the more interesting examples of automata are just not guaranteed to be reversible. A message's encoded information wouldn't always survive the journey. Two different messages could end up with the same outcome, for instance, or a message could merely evaporate after a short number of iterations. The original signal would be so entangled with the encryption-introduced noise that it wouldn't be salvageable...by anyone. Grudgingly, I adopted the restriction that none of my automata rules would include creation or destruction.

This restriction, along with the need to assign axes to the automata cells for the character mapping, just left me with a set of Cartesian coordinates whose movements somehow affected one another. Then I made several observations: 1) per common practice, the number of dimensions can be set arbitrarily; 2) each character can be encoded as the distance along a particular axis for a particular point; 3) interactions among multiple points at once, especially given the number of points needed to encode a message, are often too impractical to calculate.

After experimentation, I've settled on this impractical encryption algorithm. The idea is to divide up a message into tuples or "points", and perform pairwise-defined substitution and transposition on each point for a number of cycles. Track the values of the three parameters that control the operation of the substitution and/or transposition, and increase them (with wrap-around back to the start value) after each cycle. The three parameters are 1) the axes of the two points to manipulate, 2) the index gap between the two points (later points will wrap around the current "block" of points to find a partner and only if the later point wasn't already the partner for that earlier point), 3) the amount to manipulate with. The key has two parts, 1) the positive integer number of cycles "n" and 2) an integer array of axis initializations "d". The dimension "D" is the number of axes, so it's the length of d. D must be one of 2, 3, 5, 6, 10, 15--kudos to anyone who recognizes the definition of this set of numbers before the end of this paragraph. The recommended n is at least 210. However, the size of n must be carefully counterweighted with message length because either one has a strong tendency to drop time-performance (i.e., complexity) like a brick. Analytically speaking, I believe the relationship is O(n), but I only have rough empirical confirmation of that.
  1. Divide the plaintext into blocks of length 30, padding out the last block with blanks if necessary. After each conversion from character to number happens, add the corresponding element from d (plaintext index % dimension works fine). Perform the following steps on each block and concatenate all results. Actually, my implementation modifies the original array in-place, block by block, by passing indexes around.
  2. Start the axis parameter value at 0 (the first axis), the gap at 2*D (pair the current point to the point after the next point), and the amount at 1. Perform the following steps n times.
  3. Loop over the block point by point (this just means increasing the index by D each time), performing the following two steps.
  4. Use the current values of the axis and gap parameters to determine what two indexes to work on. If the referenced values are equal, multiply both values by -1. Otherwise, adjust each referenced value such that the greater increases and the lesser decreases by the value of the amount parameter.
  5. Swap the value of the first axis of the current point with the value of whichever axis of the next point (again, wrapping around the block if necessary) is indicated by the axis parameter.
  6. After looping over the points, adjust the three parameters for the next cycle, starting over at 0 if exceeding maximums. The axis value increases by 1 up to D - 1, the gap value increases by D up to 29 (remember that the gap calculation wraps around the block of points), and the amount value increases by 1 up to 6.
Decryption proceeds by doing everything in reverse. I discovered that perfectly reversing the steps can be subtly tricky.

One characteristic of this cipher that it shares with many other ciphers is its cyclic nature. In earlier tries at an algorithm, when the block size was the length of the text and no transposition step was integrated, specific plaintext lengths and dimension choices would result in characters that only flipped sign, no matter the number of cycles! These were "degenerate" cases when two indexes always paired up with one another; therefore, each gap parameter value was always paired up with the same axis parameter value (although the converse was usually not true--a specific axis parameter might or might not always have been paired up with the same gap parameter).

In a simple example, a message length (padded if necessary) of 32 and a dimension of 4 results in 32/4 = 8 "points" and consequently 8 possible gap values between indexes (0,4...24,28). The gap parameter value cycles through its 8 possibilities as the axis parameter value simultaneously cycles through its 4 possibilities (0,1,2,3), so now a gap of 4 always applies to an axis of 1, a gap of 8 always applies to an axis of 2, etc. The pattern is clear: a "degenerate" case corresponds to when the number of axes is a divisor of the number of possible gaps, for it is those circumstances when the cycle of axis parameter values maps nicely onto the cycle of gap parameter values. What we have here is an application of the fundamental theorem of cyclic groups (the description "maps nicely onto" is an informal way of stating that one cycle or cyclic group is a subgroup of another cycle or cyclic group).

The complication of avoiding these degenerate cases motivated me both to switch to a set block length and to a set collection of dimension choices, where the dimension choices are 1) divisors of the block length and are not 2) divisors of the associated quotient. Put another way, the block length is the least common multiple of the dimension and the block length divided by the dimension. In addition, I preferred smaller block lengths and dimension choices but I also preferred larger amounts of dimension choices. Check the handy Wikipedia table yourself if you like. As previously mentioned, I opted for a block length of 30. The recommended number of cycles, 210, is the least common multiple of the gap parameter possibilities, axis parameter possibilities, and amount parameter possibilities (7).

Key strength is a question I'm not sure how to answer. The key's "n", the number of cycles to run on the set of message points, isn't much of an obstacle to brute-force cracking. If x number of cycles hasn't yielded even a partial message, trying x + 1 cycles is no problem at all! The key's "d", the array of axis initializations, is more promising. Assuming 2000 possible values for an axis (-999 to 1000 inclusive), for a choice of 2 dimensions that would be a mere 2000^2 = 4,000,000. But a choice of 15 dimensions would be 2000^15 = 3.2768 * 10^49 according to my calculator, which is greater than 2^128. In any case, my limited testing indicated that the initialization value for a chosen axis affects a proper subset of message characters, not all--partial keys are useful, but to verify the correctness of the partial key one must have the number of cycles right.

My Javascript implementation of the Pointed cipher is part of the blog layout, down at the bottom on the right. In the page source, search for 'pointedcipher' (one word). If you use it or download it I am not responsible for any awful consequences. I don't advise you to use it for encrypting anything important or unimportant. Legal disclaimer, etc. It's purely for entertainment, educational, and/or edutainment purposes.

No comments:

Post a Comment