Wednesday, March 09, 2011

continuing communication of data sequence graphs This is an experimental release whose quality is not assured. Data sequence graph represents a set of data value sequences (IEnumerable<>), called "chunks", as nodes and directed edges in a single graph. More introductory details.

In the short comparison to git graphs, I pointed out that unlike git the nodes in a data sequence graph are identified by sequential numbers. This could be a problem if a communicated data sequence graph begins to undergo parallel development. When the same graph in different locations starts to store diverging sets of chunks, the nodes and edges will conflict. More to the point, the parallel graphs will of course continue to operate fine in separate locations, but future attempts to communicate deltas will be meaningless to the recipients.

One resolution is "publish or perish": the first published delta on the last-published graph takes global priority or precedence. Recipients of this delta then must honor it. Essentially, they 1) apply the delta to the common base to obtain the new base, then 2) take all the unpublished chunks that were applied to the common base and apply to the new base instead. The procedure is analogous to a git rebase of local commits onto the remote HEAD.

In git, the old and new bases are stored automatically in each communicating repository. By contrast, a pristine copy of the "base" data sequence graph, with all published deltas applied to obtain the most recent global iteration, must be manually stored, updated, and kept indefinitely. Although the next time the communicator first publishes deltas against this base, the communicator's data sequence graph is at least temporarily identical to the base graph.

To reuse the knock-knock joke example for familiarity and convenience...first is setting up the source files.

function prompt { ">>" }
>>echo "Knock knock. Who's there? Closure. Closure who? Closure mouth when you eat." > closure.txt
>>echo "Knock knock. Who's there? Hugh. Hugh who? You, that's who!" > hugh.txt
>>echo "Knock knock. Who's there? Eliza. Eliza who? Eliza lot. Don't trust em." > eliza.txt

Next, store the words of closure.txt in a data sequence graph, sent to files "clos.dat" and "clos.txt". Designate the creator as communicator 1, who then sends the graph to communicator 2. So this initial graph is the common base graph. Later, communicator 1 adds the words of hugh.txt to the initial graph to produce a local graph sent to files "closhugh.dat" and "closhugh.txt". Being talkative, communicator 1 figures out the delta between the local graph and the base graph and sends the delta to communicator 2. Since the base is the initial graph, the secondary graph (to run the delta against) is in clos.dat/clos.txt. The delta goes into files "hughdelta.dat" and "hughdelta.txt". These files are sent to communicator 2. Since the delta is published against the old base graph, the delta applied to the old base graph becomes the new base graph. Communicator 1's local graph in closhugh.dat/closhugh.txt is now the base graph.

>>.\DataSequenceGraphCLI.exe -s closure.txt -E clos.dat -T clos.txt
>>.\DataSequenceGraphCLI.exe -e clos.dat -t clos.txt -s hugh.txt -E closhugh.dat -T closhugh.txt
>>.\DataSequenceGraphCLI.exe -m -e closhugh.dat -t closhugh.txt -f clos.dat -u clos.txt  -E hughdelta.dat -T hughdelta.txt

Sometime after communicator 1 sent the initial graph, communicator 2 was unlucky enough to stumble on the eliza.txt knock-knock joke, which then got stored in communnicator 2's local graph as "closeliza.dat" and "closeliza.txt". Communicator 2 made sure to keep around the last-communicated graph, clos.dat/clos.txt. Not being talkative, communicator 2 opted to delay producing and sending a delta to the last base graph.

>>.\DataSequenceGraphCLI.exe -e clos.dat -t clos.txt -s eliza.txt -E closeliza.dat -T closeliza.txt

When communicator 2 receives the delta from communicator 1, communicator 2 first uses the delta on the old base graph to create a new base graph for future communication, stored as "closhugh2.txt" and "closhugh2.dat". Second, communicator 2 creates a new local graph using the old local graph, the delta, and the old base. The new local graph goes into files "closhugheliza.dat" and "closhugheliza.txt". Neither of these communicators have much flair for file names. After the rather involved command, communicator 2 ends up with a graph that contains the sentence chunks that are in 1) closure.txt, 2) hugh.txt, and 3) eliza.txt.

>>.\DataSequenceGraphCLI.exe -e clos.dat -t clos.txt -f hughdelta.dat -u hughdelta.txt -E closhugh2.dat -T closhugh2.txt
>>.\DataSequenceGraphCLI.exe -e closeliza.dat -t closeliza.txt -f hughdelta.dat -u hughdelta.txt -g clos.dat -w clos.txt -E closhugheliza.dat -T closhugheliza.txt

Eventually, communicator 2 gets around to computing a delta against the last base graph stored in files closhugh2.dat/closhugh2.txt, and then sends it to communicator 1 in files "elizadelta.dat" and "elizadelta.txt". Communicator 1 has no local chunks, so nothing more need be done than updating communicator 1's copy of the old base, which communicator 1 stores in "closhugheliza1.dat" and "closhugheliza1.txt".

>>.\DataSequenceGraphCLI.exe -m -e closhugheliza.dat -t closhugheliza.txt -f closhugh2.dat -u closhugh2.txt -E elizadelta.dat -T elizadelta.txt
>>.\DataSequenceGraphCLI.exe -e closhugh.dat -t closhugh.txt -f elizadelta.dat -u elizadelta.txt -E closhugheliza1.dat -T closhugheliza1.txt

Communicator 1 and communicator 2 have resynchronized their data sequence graphs. The graph in communicator 1's most recent base, closhugheliza1.dat/closhugheliza1.txt, is the same as the graph in communicator 2's most recent base, closhugheliza.dat/closhugheliza.txt .

No comments:

Post a Comment