Monday, 13 April 2009

Argot Versioning - Part 2 - Meta Data Naming

In this post I'll introduce the solution implemented for meta data versioning in Argot. It builds on the last post which introduced some of the versioning issues. Some light reading for getting the brain in gear after easter.

During the development of the versioning feature, a very important aspect of the system has been modified and updated. I found that every type definition needs more than a simple ascii string to define its name. Instead of a name, a location in the the type library is defined. To explain further, it's best to understand some background information and what this means for Argot.

To recap on the last post, performing type negotiation between peers (client and server) or application and file in the past requires each data type definition have a unique name. This has caused an issue with various aspects of meta data requiring a name where it has not been essential. This is because the basis of Argot is a single table which contains an identifier, name and definition.

Adding versioning into the meta data causes the single table to be broken up into multiple levels. Each name in the table may have multiple definitions. The small table example given in the last post now expands to a much larger table as shown in the table below.



Another example in Argot without versioning is that of abstract data types. These required multiple named definitions. A short example is:


meta.definition: meta.abstract();
meta.definition#basic: meta.map( #meta.definition, #meta.basic );
meta.definition#map: meta.map( #meta.definition, #meta.map );


The three definitions is actually trying to represent the following:



This diagram represents three levels to the data type structure. The first entry defines the name (meta.definition). The second entry defines version 1.0 as being an abstract type. The third and fourth entries are a relation to the version 1.0 definition and map the abstract type to other types.

Using the same naming mechanism to flatten this into a single table useful for Argot creates a group of ugly name strings:


id:10, name:”meta.definition” - meta.name;
id:11, name: meta.definition#v1.0 - meta.abstract;
id 12, name: meta.definition#meta.basic#v1.0 - meta.abstract.map #meta.basic;
id 13, name: meta.definition#meta.map#v1.0 - meta.abstract.map #meta.map;


The solution to this is to replace each name with a location. The location is an abstract type that initially has three concrete location types. The first location includes the name, the second is a version definition and includes the id of the name location and the version information. The third location is a relation type and includes the id of a versioned definition( eg 11 in the above list) and a tag. The tag is a unique string used to uniquely identify the location. As in the flat table version of Argot where every name must be unique, a location must also be unique. It must be possible to find any definition using just its location data.

The separation of location from the definition is the key concept in Argot with versioning. The location being an abstract type also means that it can be extended to include any type of location specifier. The location specifier replaces the name and provides a flexible method to specify where to place a definition in the meta data library.

An interesting aspect of the above is that there is often more information being used to specify where the data belongs than the actual data. The abstract type "meta.definition" and mapping data definitions now look like:


// 1. define the name.
(library.entry
(dictionary.name:”meta.definition”)
(meta.identity) )

// 2. define version 1.0 as abstract.
(library.entry
(dictionary.definition name:”meta.definition” version:”1.0”)
(meta.abstract [])

// 3. map meta.basic to the abstract type.
(library.entry
(dictionary.definition name:”meta.definition” version:”1.0”)
(meta.abstract.map #meta.basic))

// 4. map meta.abstract.map to the abstract type.
(library.entry
(definition name:”meta.definition” version:”1.0”)
(meta.abstract.map #meta.abstract.map)


Each entry in the above is in two parts, the location and the definition. This separation has also had other beneficial flow on effects. In the previous versions of Argot, information in the name string had to be replicated in the definition. In effect the definition was previously being used to specify both location and definition information. By using a data structure in the location, this is no longer required. An example of this is the “meta.abstract.map” definition which previously included both the abstract target and the mapping type. This now only includes the mapping.

The location information provides a mechanism that allows very flexible data structures to be defined in the data type library. This can be extended to define methods signatures or other methods of defining protocol semenatics. In effect it allows the type library to define a complex directed graph while still providing a flat one dimensional table structure so that each individual definition can be found.

Dictionary Text Format

An obvious change in the example above is that the syntax used to define a data type has also changed. The syntax is loosely based on LISP and provides a more flexible way of encoding the meta data in a text format.

Each parenthesis starts with the name of the data type. All subsequence elements is the data for that type. eg.


(library.definition name:”empty” version:”1.3)


This is an instance of the “library.definition”(v1.3) data type. The library.definition is defined as follows:


(library.entry
(library.definition name:”library.definition” version:”1.3”)
(meta.sequence [
(meta.tag name:“name” (meta.reference #meta.name))
(meta.tag name:“version” (meta.reference #meta.version))
]))


This shows that each list shown in parenthesis is actually a strict data structure.

Also in the example is how to include simple type data. “name” and “version” are the names of the fields in the library.definition structure. Field names can be specified for both simple types and data structure. The values for each follow the colon. i.e.

“field name”:”value” // not currently implemented
or
“field type”:”value”
or
“field name”:(“data structure” … ) // not currently implemented

For all value types the “field type” must provide a parser capable of parsing the value into an object used internally. In some cases a parser may be provided to parse a string into a complex internal structure. This is currently used for the meta.version type which uses MAJOR.MINOR string type.

The only other form are arrays. Arrays are specified using square brackets. e.g.

[ element1 element2 element3 ]
or
“field name”:[ element1 element2 element3 ] // not currently implemented


Meta Dictionary

The following is the full meta dictionary in its pre-compiled form. Each and every data type and structure used is defined in the meta dictionary. This provides the self referencing base from which all elements are defined. It does not attempt to define all basic data types. It only attempts to define those data types required as part of the meta dictionary. The data structures in the meta dictionary are used later to define all other common data types in the common dictionary.

You might want to skip the meta dictionary definition unless you really want to give the brain a work out.


// 0. empty
(library.entry
(library.definition u8ascii:"empty" meta.version:"1.3")
(meta.fixed_width uint16:0
[ (meta.fixed_width.attribute.size uint16:0) ]))

// 1. uint8
(library.entry
(library.definition u8ascii:"uint8" meta.version:"1.3")
(meta.fixed_width uint16:8
[ (meta.fixed_width.attribute.size uint16:8)
(meta.fixed_width.attribute.integer)
(meta.fixed_width.attribute.unsigned)
(meta.fixed_width.attribute.bigendian) ] ))

// 2. uint16
(library.entry
(library.definition u8ascii:"uint16" meta.version:"1.3")
(meta.fixed_width uint16:16
[ (meta.fixed_width.attribute.size uint16:16)
(meta.fixed_width.attribute.integer)
(meta.fixed_width.attribute.unsigned)
(meta.fixed_width.attribute.bigendian) ] ))

// 3. meta.id
(library.entry
(library.definition u8ascii:"meta.id" meta.version:"1.3")
(meta.reference #uint16))

// 4. meta.abstract.map
(library.entry
(library.definition u8ascii:"meta.abstract.map" meta.version:"1.3")
(meta.sequence [
(meta.tag u8ascii:"id" (meta.reference #meta.id))
]))

// 5. meta.abstract
(library.entry
(library.definition u8ascii:"meta.abstract" meta.version:"1.3")
(meta.sequence [
(meta.array
(meta.reference #uint8)
(meta.reference #meta.abstract.map))]))

// 6. u8ascii
(library.entry
(library.definition u8ascii:"u8ascii" meta.version:"1.3")
(meta.encoding
(meta.array
(meta.reference #uint8)
(meta.reference #uint8))
u8ascii:"ISO646-US"))

// 7. meta.name
(library.entry
(library.definition u8ascii:"meta.name" meta.version:"1.3")
(meta.reference #u8ascii))

// 8. meta.version
(library.entry
(library.definition u8ascii:"meta.version" meta.version:"1.3")
(meta.sequence [
(meta.tag u8ascii:”major” (meta.reference #uint8))
(meta.tag u8ascii:”minor” (meta.reference #uint8))
]))


// 9. meta.definition
(library.entry
(library.definition u8ascii:"meta.definition" meta.version:"1.3")
(meta.abstract [
(meta.abstract.map #meta.fixed_width)
(meta.abstract.map #meta.abstract)
(meta.abstract.map #meta.abstract.map)
(meta.abstract.map #meta.expression)
(meta.abstract.map #meta.identity)
]))

// 10. meta.identity
(library.entry
(library.definition u8ascii:"meta.identity" meta.version:"1.3")
(meta.sequence [
]))

// 11. meta.expression
(library.entry
(library.definition u8ascii:"meta.expression" meta.version:"1.3")
(meta.abstract [
(meta.abstract.map #meta.reference)
(meta.abstract.map #meta.tag)
(meta.abstract.map #meta.sequence)
(meta.abstract.map #meta.array)
(meta.abstract.map #meta.envelop)
(meta.abstract.map #meta.encoding)
]))

// 12. meta.reference
(library.entry
(library.definition u8ascii:"meta.reference" meta.version:"1.3")
(meta.sequence [(meta.reference #meta.id)]))

// 13. meta.tag
(library.entry
(library.definition u8ascii:"meta.tag" meta.version:"1.3")
(meta.sequence [
(meta.tag u8ascii:"name"
(meta.reference #u8ascii))
(meta.tag u8ascii:"data"
(meta.reference #meta.expression))]))

// 14. meta.sequence
(library.entry
(library.definition u8ascii:"meta.sequence" meta.version:"1.3")
(meta.array
(meta.reference #uint8)
(meta.reference #meta.expression)))

// 15. meta.array
(library.entry
(library.definition u8ascii:"meta.array" meta.version:"1.3")
(meta.sequence [
(meta.tag u8ascii:"size" (meta.reference #meta.expression))
(meta.tag u8ascii:"data" (meta.reference #meta.expression))]))

// 16. meta.envelop
(library.entry
(library.definition u8ascii:"meta.envelop" meta.version:"1.3")
(meta.sequence [
(meta.tag u8ascii:"size"
(meta.reference #meta.expression))
(meta.tag u8ascii:"type"
(meta.reference #meta.expression)) ]))


// 17. meta.encoding
(library.entry
(library.definition u8ascii:"meta.encoding" meta.version:"1.3")
(meta.sequence [
(meta.tag u8ascii:"data" (meta.reference #meta.expression))
(meta.tag u8ascii:"encoding" (meta.reference #u8ascii))]))

// 18. meta.fixed_width
(library.entry
(library.definition u8ascii:"meta.fixed_width" meta.version:"1.3")
(meta.sequence [
(meta.tag u8ascii:"size" (meta.reference #uint16))
(meta.tag u8ascii:"flags"
(meta.array
(meta.reference #uint8)
(meta.reference #meta.fixed_width.attribute)))]))

// 19. meta.fixed_width.attribute
(library.entry
(library.definition
u8ascii:"meta.fixed_width.attribute" meta.version:"1.3")
(meta.abstract [
(meta.abstract.map #meta.fixed_width.attribute.size)
(meta.abstract.map #meta.fixed_width.attribute.integer)
(meta.abstract.map #meta.fixed_width.attribute.unsigned)
(meta.abstract.map #meta.fixed_width.attribute.bigendian)
]))

// 20. meta.fixed_width.attribute.size
(library.entry
(library.definition
u8ascii:"meta.fixed_width.attribute.size" meta.version:"1.3")
(meta.sequence [
(meta.tag u8ascii:"size" (meta.reference #uint16))
]))

// 21. meta.fixed_width.attribute.integer
(library.entry
(library.definition
u8ascii:"meta.fixed_width.attribute.integer"
meta.version:"1.3")
(meta.sequence []))

// 22. meta.fixed_width.attribute.unsigned
(library.entry
(library.definition
u8ascii:"meta.fixed_width.attribute.unsigned"
meta.version:"1.3")
(meta.sequence []))

// 23. meta.fixed_width.attribute.bigendian
(library.entry
(library.definition
u8ascii:"meta.fixed_width.attribute.bigendian"
meta.version:"1.3")
(meta.sequence[]))


// 24. dictionary.name
(library.entry
(library.definition u8ascii:"dictionary.name" meta.version:"1.3")
(meta.sequence [
(meta.tag u8ascii:"name" (meta.reference #meta.name))
]))

// 25. dictionary.definition
(library.entry
(library.definition u8ascii:"dictionary.definition" meta.version:"1.3")
(meta.sequence [
(meta.tag u8ascii:"id" (meta.reference #meta.id))
(meta.tag u8ascii:"version" (meta.reference #meta.version))
]))

// 26. dictionary.relation
(library.entry
(library.definition u8ascii:"dictionary.relation"
meta.version:"1.3")
(meta.sequence [
(meta.tag u8ascii:"id" (meta.reference #meta.id))
]))


// 27. dictionary.location
(library.entry
(library.definition u8ascii:"dictionary.location"
meta.version:"1.3")
(meta.abstract [
(meta.abstract.map #dictionary.name)
(meta.abstract.map #dictionary.definition)
(meta.abstract.map #dictionary.relation)
]))

// 28. dictionary.definition.envelop
(library.entry
(library.definition
u8ascii:"meta.definition.envelop"
meta.version:"1.3")
(meta.envelop
(meta.reference #uint16)
(meta.reference #meta.definition)))

// 29. dictionary.entry
(library.entry
(library.definition u8ascii:"dictionary.entry" meta.version:"1.3")
(meta.sequence [
(meta.tag u8ascii:"id"
(meta.reference #meta.id))
(meta.tag u8ascii:"location"
(meta.reference #dictionary.location))
(meta.tag u8ascii:"definition"
(meta.reference #meta.definition.envelop))]))

// 30. dictionary.entry.list
(library.entry
(library.definition u8ascii:"dictionary.entry.list"
meta.version:"1.3")
(meta.array
(meta.reference #uint16)
(meta.reference #dictionary.entry )))


Library types

These types are only used for the pre-compiled definitions and are used by the compiler. They are kept separate from the meta dictionary. These are required so that a user does not need to define identifiers for each type and keep track of which entry is defined by which identifier.


// library.entry
(library.entry
(library.definition name:”library.entry” meta.version:”1.3”)
(meta.sequence [
(meta.tag “location” (meta.reference #library.location)
(meta.tag “definition” (meta.reference #meta.definition)
]))

// library.location
(library.entry
(library.definition name:”library.location” meta.version:”1.3”)
(meta.abstract [
(meta.abstract.map #library.definition)
]))

// library.definition
(library.entry
(library.definition name:”library.definition” meta.version:”1.3”)
(meta.sequence [
(meta.tag “name” (meta.reference #meta.name))
(meta.tag “version” (meta.reference #meta.version))
]))


Multiple Versions Per Stream

Each of the entries in the meta dictionary is compiled into two dictionary entries. For example the “empty” data type is defined by the following two dictionary entries:


(dictionary.entry
meta.id:1
(dictionary.name name:”empty”)
(meta.identity))

(dictionary.entry
meta.id:2
(dictionary.definition #empty meta.version:”1.3”)
(meta.fixed_width size:0
[ (meta.fixed_width.attribute.size size:0) ]))


This fits the versioning model of Argot. The first entry simply defines the name (“empty), while the second entry defines the definition of version 1.3 of the “empty” type. This mimics the internal representation of the type library. A question I am yet to resolve; should this be the external representation? Another solution for the external representation combines the two entries:


(dictionary.entry
meta.id:2
(dictionary.definition u8ascii:”empty” meta.version:”1.3”)
(meta.fixed_width size:0
[ (meta.fixed_width.attribute.size size:0) ]))


The advantage of this is that it reduces the data size for the dictionary. A consequence of this is that the meta identifier (meta.id) is the same for both the name and the specific meta data version (in this case 1.3). Therefore only the one version of a data type can be used in any individual communication or stream. This is possibly an advantage, as the constraint will create an easier to debug and program communications environment. It also allows a simpler API to be developed which only needs to map each named type to a single version. The disadvantage is that it reduces the flexibility of the communications environment; there may be situations where multiple versions of the same type need to be communicated in the one stream.

Nearly Full Circle

Another adaption to the above is to remove the version from the definition. e.g.


(dictionary.entry
meta.id:2
(dictionary.name u8ascii:”empty”)
(meta.fixed_width size:0
[ (meta.fixed_width.attribute.size size:0) ]))


The removal of the version information requires that each definition is used to create a unique signature. The signature becomes the version data used to match particular versions. This method is very close to the original method of defining data, however, the location instead of name is still required. The location type allows the relation location type to be used for abstract types and other types that are defined using multiple entries. This disadvantage of removing the version data is that it requires a more complex library and doesn't provide any form of ordering to be performed between versions. For this reason it won't be used.


Conclusion

The solution implemented for versioning meta data in Argot provides a new and innovative approach to this difficult problem. The concept of using a location in a directed graph allows any graph to be built and partially compared. In the next post I'll explore the area of remote data type negotiation and show how versioning adds new complexities.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.