RION vs. Other Formats
This text discusses how RION is different from other data formats addressing the same problem of compact, binary data communication. We have compared RION to:
Here is a table summing up the differences. An "Yes(*)" means "Yes - but with limitations". A "No(*)" means "No, but can be bend to support it". If something isn't right in this table, let us know!
|Good at raw bytes||Yes||Yes||Yes||Yes||No|
|Copy (of earlier data)||Yes||No||No||No||No|
|Object / Map||Yes||Yes||Yes||Yes||Yes|
|Schema / Class Id||Yes||No||No||No||No|
|Unspecified length arrays / maps||No||Yes||No||?||Yes|
|Arbitrary hierarchical navigation||Yes||Yes(*)||Yes(*)||Yes||Yes(*)|
|Stream mode reading||Yes||Yes||Yes||Yes||Yes|
|Stream mode writing||Yes(*)||Yes||Yes(*)||Yes||Yes|
|Extendable with new / custom types||Yes||Yes||Yes||Yes||No|
Of these formats RION is most similar to CBOR. RION is also very similar to MessagePack (though less than to CBOR). The basic encoding is so similar that these formats should have comparable read and write speeds for fields. The RION Performance Benchmarks confirm that.
RION's major difference to CBOR and MessagePack is the table data structure. The table can model tabular data similar to a CSV file or database table. An RION table only contains the column names once, followed by the column values for all rows in the table. Tables can also contain nested tables, so tables can also be used to make object graphs more compact (where a parent object can have multiple children of the same kind).
Tables are a very compact way to send arrays of objects of the same kind. As is also shown in the benchmarks, RION tables can be down to 33 - 25% of the same data encoded as JSON (and 40-50% of CBOR and MessagePack).
Not all formats support raw bytes well. By raw bytes I mean a sequence of bytes like a file, or a video frame etc. To include raw bytes in JSON they must be text encoded using either Base64 or Hex encoding. Base64 encoding will make the encoded data take up 4/3 of the original data (one third more), and Hex encoding will make the encoded data take double the amount of the original data. RION, MessagePack, CBOR and Protobuf has no problem including raw bytes.
Both RION and CBOR has UTC Date Time support, but RION's Date Time encoding use 50% or less bytes than the ISO standard (textual) format used by CBOR. Also, RION only supports UTC time - not local time (although you are required to convert to and from UTC yourself).
RION contains a special "Copy" field which enables you to reference an RION field earlier in the same RION data which should be copied at this place in the RION data. For instance, it could be a class name (see later), a long property name, a zip + city object, a large object graph, table or something else. As far as we remember, CBOR and MessagePack can use a special "string back reference" element which can be used to refer to often used strings (e.g. property names of objects of the same kind), but they are not part of their core encodings (as far as we can see).
All of the formats supports arbitrary hierarchical navigation of the encoded data, without first converting it to objects. However, since RION always knows the exact size in bytes of a complex field (a field containing other fields, like an object, array or table), RION can skip over a whole field without having to parse into its nested fields. CBOR, MessagePack and JSON cannot do that. They all require some level of parsing of the contents of a field in order to find the next element at the same hierarchical lever after it (the next sibling).
RION, CBOR, MessagePack and JSON are all self describing, meaning you don't need an external schema to read them. This is essential for a network protocol where intermediate nodes may have to route messages along to other nodes. According to Protobuf's own docs you cannot see where one Protobuf message ends and the next begins, meaning Protobuf is not fully self describing. You can see where the individual Protobuf fields start and end, but not the full message.
The fact that Protobuf is not fully self describing makes it unsuitable as a network protocol message format (although you could route Protobuf messages inside other types of messages). That a data format is self describing also means that it is possible to convert a file of these formats to a textual format (JSON is alread textual) to see what is actually stored in the file.
RION supports several levels of self describing messages. At the most describing level RION can embed schema or class names (Complex Type Ids) inside objects, arrays or tables. This takes up more space of course. You can also just embed a short schema / class id in terms of a shorter number or textual code, and then translate that when reading the RION message.
Schema / class names are optional. You can also just serialize objects with key,value pairs like JSON. RION supports that too. In fact, CBOR, MessagePack and JSON support this level of self describing messages too. A tiny difference is that RION keeps the data type when sending NULL values (e.g. a null int-64 or a null UTF-8 text). Both CBOR, MessagePack and JSON loses the type when transferring NULL values. A null has no type. (but as said - his is not a big thing).
RION also supports a compact level of objects where property names are left out. This is very similar to how RION tables work, where the property names of objects are only listed once. The compact level of objects and tables makes RION very similar to how Protobuf looks encoded. Consequently, this encoding mode also matches Protobuf's performance (faster writes but slower reads than Protobuf). Even if these compact objects do not contain any property names, they are still self describing enough that you can see where fields start and end, plus their data type, without an external schema. You cannot do that with Protobuf (as far as we know).
RION has support for expressing cyclic references between objects. At this point this support is not 100% finalized.
RION is designed to function as a network message format (among other things) for the IAP network protocol. One feature we plan to build into IAP is caching of data related to the IAP connection. For instance, a web server could ask a client to cache a file. Or, an API server could ask a client to cache some data (e.g. a service status) which it returns often. Later in the session the server can then refer to this cached file (any RION field, actually) as part of a new message it sends.
RION's biggest drawback compared to CBOR is that CBOR allows for stream-mode writing of arrays and objects. In stream-mode writing the element contains no information in the beginning about how large the object or array is. Instead the object or array has an end-marker. This stream-mode writing allows CBOR data to be generated and streamed directly out on the network.
Since RION fields all contain the size in bytes of a field right at the beginning of the field, you can only use stream-mode-writing with RION when you know the size of a field ahead of time. This is normally true with primitive fields (e.g. a string or int-64), or even with files read from disk where you know the size ahead of time. But with larger objects generated based on e.g. database queries stream-mode writing is not possible. You will have to buffer up the message before sending it. This can be done reasonably efficiently, so this mostly noticeable with larger messages (4K and up).
Remember, files from disk, which are often larger than 4K, can still be written using stream-mode writing with RION, so this is only an issue with large amounts of generated data (e.g. HTML files put together from templates etc.). However, we have plans to address these issues elsewhere in IAP.
Stream-mode reading is fully possible in RION.