Tuesday, March 08, 2011

telling knock-knock jokes to a data sequence graph

https://github.com/ArtV/DataSequenceGraph. This is an experimental release whose quality is not assured. Data sequence graph represents a set of data value sequences (IEnumerable<>), called "chunks", as nodes and directed edges in a single graph. More introductory details.

Here are some sample executions of the data sequence graph CLI program at a PowerShell prompt.

function prompt { ">>" }
>>echo "Knock knock. Who's there? Closure. Closure who? Closure mouth when you eat." > closure.txt
>>.\DataSequenceGraphCLI.exe --verbose --splittext closure.txt --outedges edges.dat --outvalues values.txt

These are the long-form parameters; one-character parameters are also available. The line splits the contents of closure.txt into sentence chunks with word values, sends the resulting graph in binary+text format into two output files, and because of "verbose" also echoes back the entire graph contents, first node by node then chunk (route) by chunk...

Added sentence starting with Knock at start node 0
Added sentence starting with Who's at start node 3
Added sentence starting with Closure at start node 6
Added sentence starting with Closure at start node 8
Added sentence starting with Closure at start node 10

0 Gate
   ..1 if already -1..-1
1 Value: Knock
   ..2 if already 0..1
2 Value: knock
   ..0 if already 0..1
3 Gate
   ..4 if already -1..-1
4 Value: Who's
   ..5 if already 3..4
5 Value: there
   ..3 if already 3..4
6 Gate
   ..7 if already -1..-1
7 Value: Closure
   ..6 if already 6..7
   ..9 if already 8..7
   ..11 if already 10..7
8 Gate
   ..7 if already -1..-1
9 Value: who
   ..8 if already 8..7
10 Gate
   ..7 if already -1..-1
11 Value: mouth
   ..12 if already 7..11
12 Value: when
   ..13 if already 11..12
13 Value: you
   ..14 if already 12..13
14 Value: eat
   ..10 if already 10..7

------ Chunks in graph:
#1:
0..1:Knock..2:knock
#2:
3..4:Who's..5:there
#3:
6..7:Closure
#4:
8..7:Closure..9:who
#5:
10..7:Closure..11:mouth..12:when..13:you..14:eat

Now a second knock-knock joke. The previously generated graph for closure.txt is loaded by specifying the input files and the "missing" option requests output of the difference rather than showing the end result after adding...

>>echo "Knock knock. Who's there? Hugh. Hugh who? You, that's who!" > hugh.txt
>>.\DataSequenceGraphCLI.exe --loadedges edges.dat --loadvalues values.txt --splittext hugh.txt --missing --outedges hughdelta.dat --outvalues hughdelta.txt
Added sentence starting with Knock at start node 15
Added sentence starting with Who's at start node 16
Added sentence starting with Hugh at start node 17
Added sentence starting with Hugh at start node 19
Added sentence starting with You at start node 20

15 Gate (new) no requisite to next required
1 Value "Knock" requisite is route edge #0
2 Value "knock" no implied edge to next node
16 Gate (new) no requisite to next required
4 Value "Who's" requisite is route edge #0
5 Value "there" no implied edge to next node
17 Gate (new) no requisite to next required
18 Value "Hugh" (new) no implied edge to next node
19 Gate (new) no requisite to next required
18 Value "Hugh" requisite is route edge #0
9 Value "who" no implied edge to next node
20 Gate (new) no requisite to next required
21 Value "You" (new) requisite is route edge #0
22 Value "that's" (new) requisite is route edge #1
9 Value "who" no implied edge to next node

hughdelta.txt contains the string "Hugh|You|that's" (15 characters or 15 UTF-8 bytes). That end list of nodes/requisite information is what goes into hughdelta.dat, in the binary form that I described here. hughdelta.dat is 36 bytes. Each line in the list is a node/requisite record. There are 5 new gate nodes, one for each sentence/chunk/route, which consume 5 * 2 = 10 bytes. There are 7 lines that use preexisting value nodes, which consume 7 * 2 = 14 bytes, for a running total of 24 bytes. Finally, there are 3 lines of new value nodes, which consume 3 * 4 = 12 bytes (2 bytes for the usual sequence number and guide bits and 2 bytes for the index for the new values), to reach the final total of 36. Notice that since all requisites fall into one of "no requisite or no implied edge", "starting requisite" (lines with route edge #0), "the previous edge is the requisite" (lines with the route edge #1), the reader of the file can reconstruct the requisites without the requisites consuming actual file space as stated data values. One last knock-knock joke, this time just adding the new joke to the first graph and showing the result...

>>echo "Knock knock. Who's there? Eliza. Eliza who? Eliza lot. Don't trust em." > eliza.txt
>>.\DataSequenceGraphCLI.exe --loadedges edges.dat --loadvalues values.txt --splittext eliza.txt
Added sentence starting with Knock at start node 15
Added sentence starting with Who's at start node 16
Added sentence starting with Eliza at start node 17
Added sentence starting with Eliza at start node 19
Added sentence starting with Eliza at start node 20
Added sentence starting with Don't at start node 22

0 Gate
..1 if already -1..-1
1 Value: Knock
..2 if already 0..1
..2 if already 15..1
2 Value: knock
..0 if already 0..1
..15 if already 15..1
3 Gate
..4 if already -1..-1
4 Value: Who's
..5 if already 3..4
..5 if already 16..4
5 Value: there
..3 if already 3..4
..16 if already 16..4
6 Gate
..7 if already -1..-1
7 Value: Closure
..6 if already 6..7
..9 if already 8..7
..11 if already 10..7
8 Gate
..7 if already -1..-1
9 Value: who
..8 if already 8..7
..19 if already 19..18
10 Gate
..7 if already -1..-1
11 Value: mouth
..12 if already 7..11
12 Value: when
..13 if already 11..12
13 Value: you
..14 if already 12..13
14 Value: eat
..10 if already 10..7
15 Gate
..1 if already -1..-1
16 Gate
..4 if already -1..-1
17 Gate
..18 if already -1..-1
18 Value: Eliza
..17 if already 17..18
..9 if already 19..18
..21 if already 20..18
19 Gate
..18 if already -1..-1
20 Gate
..18 if already -1..-1
21 Value: lot
..20 if already 20..18
22 Gate
..23 if already -1..-1
23 Value: Don't
..24 if already 22..23
24 Value: trust
..25 if already 23..24
25 Value: em
..22 if already 22..23

------ Chunks in graph:
#1:
0..1:Knock..2:knock
#2:
3..4:Who's..5:there
#3:
6..7:Closure
#4:
8..7:Closure..9:who
#5:
10..7:Closure..11:mouth..12:when..13:you..14:eat
#6:
15..1:Knock..2:knock
#7:
16..4:Who's..5:there
#8:
17..18:Eliza
#9:
19..18:Eliza..9:who
#10:
20..18:Eliza..21:lot
#11:
22..23:Don't..24:trust..25:em

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.