Monday, 3 February 2014

Argot Message Format - A self describing binary message format

Today I've committed to the Subversion repository the first implementation of the Argot Message format.  This is a feature I've been wanting to add for sometime, and a project I'm currently working on has given me the excuse to get it done.  The idea of the Argot Message format is that a binary message contains both the data and data dictionary with little overhead.

It's easiest to understand with a demonstration.   As part of the Argot test code I've defined a data type called 'demo':

definition demo 1.0:
{
    @short #int8;
    @byte #int16;
    @text #u8utf8;   
};

This rudimentary type contains three fields named 'short', 'byte' and 'text'.  If an instance of this data type were written to a stream it would look like:

00 0a 33  05 h  e  l  l  o  

This can be described as:

short - 2 bytes with value 10.
byte - 1 byte with value 51.
text - 6 bytes with one byte value 5 and the text 'hello'.

If an application received just these 9 bytes alone it would need to have previously known that the sender was sending this 'data' type.  However, with the Argot message format the following is sent:

A  13 01 32  20 00 04 d  e  m  o  01 00 1b 0f 03 0e 05 s  h  o  r  t  0d
28 0e 04 b  y  t  e  0d 01 0e 04 t  e  x  t  0d 08 32  00 0a 33  05 h  e  l  l  o 

The message is now 51 bytes, however, contains a full description of the data format along with the actual data.  Breaking the format down:

A  - Magic value indicating this is an Argot message format.
0x13 - Version of the Argot meta dictionary and Argot message format being used.
0x01 - The number of data types defined.  One here but could be thousands.

Each data type contains a unique identifier, a type name identifier and a type definition.  In this message it contains:

0x32 - The unique identifier for the data type. Integer 50.
20 00 04 d  e  m  o 01 00  - The type location and version.  demo version 1.0.
1b 0f 03 0e 05 s  h  o  r  t  0d 28 0e 04 b  y  t  e  0d 01 0e 04 t  e  x  t  0d 08 - The structure of the data type as defined above.

After the data dictionary is read the actual data is written.

0x32 - The identifier for the data type that follows.  In this case the 'demo' type.
00 0a 33  05 h  e  l  l  o - The actual data.

The first data type is defined as type 50 in this case as the format version 1.3 specifies that the reader assumes that the first 49 data types are known to the recipient.  The other 49 data types include all the Argot meta data types and the following base types:

40 - uint16
41 - uint32
42 - uint64
43 - int8
44 - int16
45 - int32
46 - int64
47 - float32
48 - double64
49 - u8boolean
The base types allow any other data types to be defined.  The data dictionary in the message could contain a large and complex set of data with every part of the data defined.  To write the message in Argot was simply:

msg.writeMessage(baos, MixedData.TYPENAME,
           new MixedData( 10, (short) 51, "hello"));

As I work with this new format, I may add additional elements.  One such idea is to include a reference to a known data dictionary provided by a URL.  In this way no data dictionary is required, yet, both sender and recipient will have a reference to the data types used.  This may be beneficial in Internet of Things applications where an additional 50 bytes may be considered too large.



Thursday, 8 August 2013

Argot, Big-Data and Hazelcast

Hazelcast is an in-memory data grid that provides a scaleable data store with a rich set of features.  Over the past year I've been using it in other applications and I've been impressed with it's capabilities.  A new feature in recently released Hazelcast 3.0, is the ability to plug-in your own serialization code.  By allowing Argot to be used both in the in-memory data grid and as the transport for Internet of Things applications, it allows the same code developed for Argot to be used in Big-Data applications.  The combination of Hazelcast and Argot means less code, less development time and more scale.

Recently the Hazelcast team made available a small benchmark code that provided both examples of how various serialization software could be embedded into Hazelcast and showed a rudimentary benchmark for the different methods.  The code is made available here.  I've forked the code and included Argot as an additional example.

The rest of this post covers the details of how I integrated with Hazelcast.  It also includes a discussion of how Argot's performance compares to the other serialization techniques used in the benchmark.

Argot Hazelcast Integration

In this example the Argot definition for the 'sample' object is defined as follows. The default Argot common dictionary doesn't include the generic long_array or double_array, so I've also created them here:

definition long_array 1.0:
    (meta.sequence [
        (meta.array
            (meta.reference #uvint28)
            (meta.reference #int64))
    ]);
 
definition double_array 1.0:
    (meta.sequence [
        (meta.array
            (meta.reference #uvint28)
            (meta.reference #double))
    ]);


definition sample 1.0:
{
  @intVal #int32;
  @floatVal #float;
  @shortVal #int16;
  @byteArr #u16binary;
  @longArr #long_array;
 /* @dblArr #double_array; */
  @str #u32utf8;
};


The Sample.java class is modified to include Java annotations.  These are used to bind the names of the variables in the definition to the Java fields.

@ArgotMarshaller(TypeAnnotationMarshaller.class)
public class SampleObject implements java.io.Serializable, KryoSerializable {

    public static final String ARGOT_NAME = "sample";
    
    @ArgotTag("intVal")
    public int intVal;
    
    @ArgotTag("floatVal")
    public float floatVal;
    
    @ArgotTag("shortVal")
    public short shortVal;
    
    @ArgotTag("byteArr")
    public byte[] byteArr;
    
    @ArgotTag("longArr")
    public long[] longArr;
    /*
    @ArgotTag("dblArr")
    public double[] dblArr;
    */
    @ArgotTag("str")
    public String str;
 
    ... 


With the definition of the data defined, this is compiled and bound to Java classes using the following an Argot loader class.

public class SampleArgotLoader 
extends ArgotCompilerLoader 
{
    private static final String NAME = "sample.argot";
 
    public SampleArgotLoader() 
    {
        super(NAME);
    }

    @Override
    public void bind( TypeLibrary library )
    throws TypeException
    {
        library.bind( library.getTypeId(SampleObject.ARGOT_NAME, "1.0"), SampleObject.class );
        library.bind( library.getTypeId("long_array", "1.0"), new TypeArrayMarshaller(), new TypeArrayMarshaller(), long[].class);
        library.bind( library.getTypeId("double_array", "1.0"), new TypeArrayMarshaller(), new TypeArrayMarshaller(), double[].class);
    }

The above elements would be required for any Argot based implementation and is not a special requirement of the Hazelcast implementation.  I've included them here for completeness of the example.

The bridge between Hazelcast and Argot is defined by a StreamSerializer implementation.  The ArgotSerializer provides the generic implementation that can be used by any object.  The read and write methods read and write the object to stream.

    public Object read(ObjectDataInput input) 
    throws IOException 
    {
        try 
        {
            TypeInputStream typeIn = new TypeInputStream( (InputStream) input, _map);
            return (SampleObject) typeIn.readObject(_typeId);
        } 
        catch (TypeException e) 
        {
            throw new IOException(e.getMessage(), e);
        }
    }
 
    public void write(ObjectDataOutput output, Object object) 
    throws IOException
    {
        try 
        {
            TypeOutputStream typeOut = new TypeOutputStream( (OutputStream) output, _map );
            typeOut.writeObject( _typeId, object );
        } 
        catch (TypeException e) 
        {
            throw new IOException(e.getMessage(), e);
        }
    }
 
The Hazelcast API makes registering and using custom serialization reasonable simple to configure.  A class is bound to a serializer.  In this example I'm binding the SampleObject class to the ArgotSerializer:

SerializationConfig config = new SerializationConfig();
config.addSerializerConfig(new SerializerConfig().
    setTypeClass(SampleObject.class).
    setImplementation(new ArgotSerializer(typeMap,SampleObject.ARGOT_NAME)));

That's it, the Sample object can now be serialized to the Internet of Things and can also be stored in a Big-Data environment.

Argot Performance

Having not looked at Argot performance for some time, I was very interested to see how Argot would perform in the benchmark.  While performance is not necessarily the number one decision in choosing a serialization protocol, it is reasonably important.

The first test I conducted didn't include either the array of long values or the array of double values.  The results were:

Argot Serialization4172 bytes348 ms
Java Serialization4336 bytes411 ms
DataSerializable4237 bytes263 ms
IdentifiedDataSerializable4186 bytes170 ms
Portable4205 bytes228 ms
Kryo4336 bytes277 ms
Kryo-unsafe4336 bytes256 ms

While Argot didn't come first in the first performance test, it wasn't last which isn't bad, especially considering Argot hasn't had a lot of performance tuning done.  Argot did win at using the least amount of data. This can be achieve as it uses meta data describe the structure of the binary format.  This results in less data stored at runtime.

Being reasonably happy with Argot's performance, the next step I did was include the array of long values in the sample object.  The benchmark stores 3000 long values to the array.  The initial results were very bad for Argot and made me realise that the implementation of int64 in Argot was based on some old code that wrote individual bytes to the stream.  After fixing this up, the following results were achieved:

Argot Serialization28174 bytes5587 ms
Java Serialization28374 bytes862 ms
DataSerializable28241 bytes918 ms
IdentifiedDataSerializable28190 bytes851 ms
Portable28213 bytes904 ms
Kryo28374 bytes786 ms
Kryo-unsafe28374 bytes727 ms

Ouch! The Argot performance of writing an array of 3000 int64's was six times worse than the closest performer.  This test showed me two things; the first is that this benchmark is heavily biased towards arrays, and the second is that Argot's default array marshaller is terrible at dealing with large arrays of simple types.

After some further investigation I discovered that performance for arrays is all gained/lost in the number of calls to read or write to the stream.  If you can minimize stream writes you can improve the performance greatly.  Not to give up, I used Argot's ability to implement custom marshalling to implement a fast int64 array marshaller.  The writer for the marshaller is as follows:

        public void write(TypeOutputStream out, Object o) 
        throws TypeException, IOException 
        {
            long[] longArray = (long[]) o;            
            
            _writer.write(out, longArray.length);
            OutputStream output = out.getStream();
            byte[] bytes = new byte[longArray.length*8];
            int pos = 0;
            
            for (int x=0; x<longArray.length;x++)
            {
                long s = longArray[x];
                bytes[pos] = (byte)(s >>> 56);
                bytes[pos+1] = (byte)(s >>> 48);
                bytes[pos+2] = (byte)(s >>> 40);
                bytes[pos+3] = (byte)(s >>> 32);
                bytes[pos+4] = (byte)(s >>> 24);
                bytes[pos+5] = (byte)(s >>> 16);
                bytes[pos+6] = (byte)(s >>> 8);
                bytes[pos+7] = (byte)(s);
                
                pos+=8;
            }

            output.write(bytes,0,pos);
        }


This implementation minimises writes to the stream by allocating the full byte stream and preparing the data before writing it out to the stream.  A similar reader was written which reads the full array in before parsing it.  To connect the new marshaller to the long array was a simple matter of changing the binding:

  library.bind( library.getTypeId("long_array", "1.0"), 
                new LongArrayReader(), new LongArrayWriter(), long[].class);

After running the benchmark again, I got the following results:

Argot Serialization28174 bytes768 ms
Java Serialization28374 bytes832 ms
DataSerializable28241 bytes899 ms
IdentifiedDataSerializable28190 bytes816 ms
Portable28213 bytes887 ms
Kryo28374 bytes713 ms
Kryo-unsafe28374 bytes703 ms

With the optimized array marshaller Argot comes third in the benchmark!  Not a bad result.

In conclusion, it's been very interesting to see how Argot compares to other serialization techniques in both implementation and performance.  I've concluded that for objects with many simple fields, Argot is performing reasonably, however, additional buffering would likely bring Argot inline with some of the faster serialization implementations.  I will put that on the task list for Argot 1.4.  In addition, the serialization of large arrays are best done by specialised implementations. In implementations where performance is important it's good to know that Argot is flexible enough to allow this to be configured.

Sunday, 9 June 2013

argot 1.3.b4 released

Today I've released Argot v1.3.b4 (beta 4).  This has a couple of changes found while developing the Arduino MQTT Argot tutorial which went online today.  The changes include:

  • Modify the meta marshaller - During the development of the tutorial I found that a TODO in the code which related to building TypeMaps which use the MetaAbstract type.  The code originally required that the full meta dictionary be included in all user defined TypeMaps.  This has now been fixed and the new tutorial shows it nicely with having only 7 data types.
  • Update the compiler - Added the ability to use expressions in the simple definition syntax.  This was mainly so I could use the meta.abstract data type simply in the tutorial.  It was also on the TODO list, so nice to get this out of the way too.
  • Update the Argot marshaller annotation - The original annotation only allowed a select few marshallers to be used which were defined in an enumerated type.  The enumerated type is now removed and the marshaller class is now added directly.
The new tutorial demonstrates using Argot with MQTT.  The use of the Arduino was not required, however, it does nicely show an end-to-end use case of the software.  The Arduino code is currently hand crafted to process the message received from Argot.  Future releases will include an Arduino Argot library to automate the processing code.

Friday, 7 June 2013

Dummies guide to installing Mosquitto MQTT on OSX


I've been setting up to write some demos of using Argot over MQTT.  To run tests I decided to use the Mosquitto MQTT broker.  It's really quite simple to install on OSX if you follow this simple procedure.

Before you do anything ensure that you Xcode installed on your computer.  This is a pre-requisite for building and installing Mosquitto.  Go to the App store and search for and install Xcode.

After Xcode is installed open a Terminal window and install Brew.  The output should look something like:

$ ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"
==> This script will install:
/usr/local/bin/brew
/usr/local/Library/...
/usr/local/share/man/man1/brew.1

Press ENTER to continue or any other key to abort
==> Downloading and Installing Homebrew...
remote: Counting objects: 114731, done.
remote: Compressing objects: 100% (49072/49072), done.
remote: Total 114731 (delta 81867), reused 94660 (delta 64711)
Receiving objects: 100% (114731/114731), 16.62 MiB | 1.85 MiB/s, done.
Resolving deltas: 100% (81867/81867), done.
From https://github.com/mxcl/home-brew
 * [new branch]      master     -> origin/master
HEAD is now at 28e3657 ttytter: use Formula
Warning: Install the "Command Line Tools for Xcode": http://connect.apple.com
==> Installation successful!
You should run `brew doctor' *before* you install anything.
Now type: brew help

As directed I ran the 'brew doctor' command before installing anything.
$ brew doctor
Warning: Experimental support for using Xcode without the "Command Line Tools".
You have only installed Xcode. If stuff is not building, try installing the
"Command Line Tools for Xcode" package provided by Apple.

Warning: Your file-system on / appears to be CaSe SeNsItIvE.
Homebrew is less tested with that - don't worry but please report issues.


Now that brew is installed it's time to install Mosquitto. Next at the command prompt type 'brew install mosquitto'. The output looks something like:
$ brew install mosquitto
==> Installing mosquitto dependency: pkg-config
==> Downloading https://downloads.sf.net/project/machomebrew/Bottles/pkg-config-
######################################################################## 100.0%
==> Pouring pkg-config-0.28.lion.bottle.tar.gz
 /usr/local/Cellar/pkg-config/0.28: 10 files, 636K
==> Installing mosquitto dependency: cmake
==> Downloading https://downloads.sf.net/project/machomebrew/Bottles/cmake-2.8.1
######################################################################## 100.0%
==> Pouring cmake-2.8.11.lion.bottle.tar.gz
  /usr/local/Cellar/cmake/2.8.11: 693 files, 34M
==> Installing mosquitto dependency: openssl
==> Downloading http://openssl.org/source/openssl-1.0.1e.tar.gz
######################################################################## 100.0%
==> perl ./Configure --prefix=/usr/local/Cellar/openssl/1.0.1e --openssldir=/usr
==> make
==> make test
==> make install MANDIR=/usr/local/Cellar/openssl/1.0.1e/share/man MANSUFFIX=ssl
==> Caveats
To install updated CA certs from Mozilla.org:

    brew install curl-ca-bundle

This formula is keg-only: so it was not symlinked into /usr/local.

Mac OS X already provides this software and installing another version in
parallel can cause all kinds of trouble.

The OpenSSL provided by OS X is too old for some software.

Generally there are no consequences of this for you. If you build your
own software and it requires this formula, you'll need to add to your
build variables:

    LDFLAGS:  -L/usr/local/opt/openssl/lib
    CPPFLAGS: -I/usr/local/opt/openssl/include
==> Summary
  /usr/local/Cellar/openssl/1.0.1e: 435 files, 15M, built in 3.4 minutes
==> Installing mosquitto
==> Downloading http://mosquitto.org/files/source/mosquitto-1.1.3.tar.gz
######################################################################## 100.0%
==> cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/Cellar/mosquitto/1.1.3 -DCMAKE_BUILD_TYPE=None -DCMAKE_FIND_FRAMEWORK
==> make install
==> Caveats
mosquitto has been installed with a default configuration file.
    You can make changes to the configuration by editing
    /usr/local/etc/mosquitto/mosquitto.conf

Python client bindings can be installed from the Python Package Index
    pip install mosquitto

Javascript client is available at
    http://mosquitto.org/js/

To have launchd start mosquitto at login:
    mkdir -p ~/Library/LaunchAgents
    ln -sfv /usr/local/opt/mosquitto/*.plist ~/Library/LaunchAgents

Then to load mosquitto now:
    launchctl load ~/Library/LaunchAgents/homebrew.mxcl.mosquitto.plist

Or, if you don't want/need launchctl, you can just run:
    mosquitto -c /usr/local/etc/mosquitto/mosquitto.conf

Warning: /usr/local/sbin is not in your PATH
You can amend this by altering your ~/.bashrc file
==> Summary
   /usr/local/Cellar/mosquitto/1.1.3: 26 files, 564K, built in 8 seconds

An easy way to run mosquito is from the command line.
$ /usr/local/sbin/mosquitto -c /usr/local/etc/mosquitto/mosquitto.conf
1370606778: mosquitto version 1.1.3 (build date 2013-06-07 21:46:21+1000) starting
1370606778: Config loaded from /usr/local/etc/mosquitto/mosquitto.conf.
1370606778: Opening ipv4 listen socket on port 1883.
1370606778: Opening ipv6 listen socket on port 1883.

It's now setup and ready to go!

Friday, 31 May 2013

argot 1.3.b3 released

As a step towards a full release of Argot, I've decided to release Argot 1.3.b3 (beta 3).  This release includes a number of significant changes and is nearly ready for a full release.  It includes the following changes:
  • Removed packages. The older 1.2.x versions of Argot includes packages which are designed to operate in a more traditional request/response environment.  As the Argot 1.3 release is targeted towards MQTT and the Internet of Things, these packages will be released in a separate argot-remote package in the future.
  • Removed example. The older 1.2.x versions of Argot included a bookstore example which was modeled on older RPC mechanisms.  Once again, as this release is targeted towards the Internet of Things I've decided to remove it.  More relevant examples will be provided in the future.
  • Updated the meta dictionary.  The Argot meta dictionary defines the core types from which all other data types are defined.  The updates includes combining a number of extensions which are not strictly required by the meta dictionary, but make maintenance and implementation easier.  This allowed removing a few classes from the library.
  • Documentation removed.  The documentation provided with the older version is out of date and would be more confusing than anything else.  Removed until this is updated in the future.
  • Additional cleanup.   Changes such as updating the name of the bool type to boolean and various other small changes were made.  The full list is maintained in the readme file.
Given this release contains no documentation and only one example is provided, it is only for the brave.  The next steps are to create examples which show off the capability of Argot better and ensure the core library and compiler are thoroughly tested. To that end I've acquired a Freetronics EtherTen and a Freetronics Cube.  These are both Arduino compatible and will provide great examples of Argot in action.


Saturday, 25 May 2013

Argot website is up!

A little less than a month ago, I committed to getting Argot online again.  I have been spurred on by the rise of activity surrounding the Internet of Things, and in particular the transport layer MQTT.  I feel that Argot provides a unique solution not currently available which will fit nicely with MQTT and the Internet of Things. The Argot website is still sparse, however over the next few weeks/months I will be releasing more software and documentation. To start, I have provided a simple example which demonstrates the development of an Argot application in Java.  You can view this on the 'Quick Start' page and easily install it to Eclipse to try it out.

There's still a few things to be done before I can release the next release of Argot (v1.3.0).  These include:
  • Update the Argot compiler to make it a little easier to use.  The current Argot language is based around S-Expressions, however, for many this looks quite foreign.  I plan a few small tweaks to make the most common data structures easy to define.
  • Strip out extensions from the core Argot library.  In the past, a few extensions have crept into the main Argot library.  These are being separated to simplify the core Argot library.
Following this, I will be continuing development of an Argot MQTT extension.  The planning for this has already started and has been documented on the MQTT mailing list.  I'll write more about this later.

In regards to the website, I'm particular pleased with the new logo and design which was put together with the assistance of my wonderful wife and sister.


Monday, 29 April 2013

Argot Lives

Just a short note that Argot is returning.  There's still plenty of work to do before I can be developing the software, but it's nice to get back into it.

All the source code is being hosted at Google Code at https://code.google.com/p/argot/
The subversion repository has been moved so all historical changes are not lost.

I've also moved all the old blog posts from blog.livemedia.com.au to here.  Nothing lost there either.

Next job is to build a website.  Nothing fancy, just enough to get the ideas across.