Friday, March 04, 2011

git graphs compared to data sequence graphs

https://github.com/ArtV/DataSequenceGraph. This is an experimental release whose quality is not assured. Data sequence graph represents a set of data value sequences (IEnumerable<>), called "chunks", as nodes and directed edges in a single graph. More details.

Git represents revision history as a graph of nodes and directed edges, where each node is a commit and each edge points back to an ancestor commit. However, the needs of git are different than the needs of a data sequence graph, which underline the differences in design.
  • A git graph has no cycles/circles (a commit cannot be its own grandpa, for instance). A data sequence graph may, but never within a particular data chunk's route. The routes use the cycle like a traffic roundabout, jumping into the circle and then leaving before completing a circuit. This is possible because of the rules that an edge's requisite edge must be met and the edge to be passively followed is the one edge whose requisite occurs first/earliest in the route's history.
  • Nodes in a git graph aren't centrally numbered like in a data sequence graph. It couldn't use that strategy and still be thoroughly decentralized. Instead, the renowned "content addressable filesystem" of git names the commit with a hash of its information including its immediate ancestors. Git avoids global name collisions, but at the unavoidable expense of the required space for the hash. Although in practice a short prefix of the hash/name suffices to be unambiguous. Since a global or decentralized data sequence graph would probably be useless (but it's an intriguing idea...), the same constraint isn't applicable.
  • Git also requires immutable commits or nodes. Not in the sense that rewriting or amending is impossible, but that a commit's identifier must also change whenever the commit's information changes (well, if you have the know-how you can bend this a bit, but normally...). The state of the revision-tracked content must reflect its ancestors definitively at the point of any commit. A selected commit represents one and only one content state. Whereas a node in a data sequence graph non-uniquely represents an individual value, and many data chunks/routes could possibly include that value/node. This difference also implies that while the identifier of a git commit would need to be different if it were a destination of greater or fewer edges, the simplistic numeric identifier of a node in a data sequence graph must remain the same no matter how other edges and nodes change.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.