RION Design Goals
RION is a binary data format which is flexible enough to encode a wide variety of data. When designing RION we wanted RION to be:
- Independent of the network protocol
- Self describing
- Easy to handle for servers
- Easy to handle for small devices
Independent of the Network Protocol
While RION was designed as part of IAP, RION is a data format that is independent of network protocols. Thus, you can use RION outside of IAP as an alternative data format to JSON, XML, YAML etc. As RION is reasonably compact and fast, using RION over HTTP might be a first step for organizations looking to switch to IAP from e.g. HTTP/JSON, SOAP/XML etc.
You could even use RION as a file format. As you will see later, RION is a pretty good alternative to CSV files. You could also use RION as a log file format. It would be pretty fast to scan through RION records in a file.
RION was designed from the beginning to be very expressive. The less users would need to resort to other data formats the better. If we can all just exchange data in RION, we can use the same data parsers and generators. RION is capable of expressing everything that you can express in CSV, JSON and XML. That means that you could actually convert a CSV, JSON or XML file to an RION file without losing information.
Yes, that is also what they said about XML, but for XML it turned out not to work out. Textual data formats are naturally bad fits for sending binary data like numbers and files. We believe RION changes that, so now it is our turn to commit hubris with RION by claiming it as a general purpose data format.
RION is designed to be able to model these commonly used data structures:
- Binary file / stream
- Stream of fields (unbounded)
- Array of fields (bounded)
- Map (key, value pairs)
- Objects with properties (key, value pairs)
- Object graphs with objects inside objects etc. (like JSON)
- Tables (like CSV files)
- Objects (elements), text and binary fields mixed (like XML)
You can combine the different data structures. You can have object graphs inside tables, or tables inside object graphs, or mix tables, objects, text and binary data with each other.
Of course the built-in RION data types aren't right for every kind of data out there. In case you need to send something that RION does not support, you can just send it as raw bytes (which RION supports). This has been a priority from the beginning - that users can default to opaque byte sequences when transmitting data that RION is not explicitly designed to encode. It is also possible to define your own object types. More on that later.
Another important design goal of RION is performance. When we were anyways reinventing a data format, why not try to make it as fast as possible? We have tried that with RION, and our initial measurements look promising.
We have implemented a toolkit for working IAP called IAP Tools. This will be open sourced when it is more feature complete. Our current performance measurements are based on the performance of IAP Tools. To compare RION performance to JSON we have used the Jackson JSON parser which is one of the fastest JSON parsers out there.
Being a binary format RION is naturally faster to read and write than textual formats. Booleans, integers, floating points and binary data is faster to read and write from a binary form than a textual form. We have seen performance improvements of up to 1000% (x 10) compared to reading and writing the same Java objects from / to JSON with Jackson. On average though, expect a speed increase somewhere between 50 to 200%.
The speed improvement depends on the type and size of the data being serialized. The speed difference so far seems so be largest with small objects and types that don't serialize so well to text, like boolean and floating point variables. Jackson is pretty fast at serializing integers, so there the speed improvement is somewhere between 0 and 50% on average.
The exception is when reading and writing text - in which case RION should perform about the same as textual formats like JSON and XML. But even with text IAP Tools have some built-in classes that can make it faster to read and write text. These techniques could also be used in a JSON parser - but they don't seem to be so far.
Read Speed vs Write Speed
In a few cases we have had to make trade-offs between read speed / flexibility and write speed / flexibility. In these situations we have typically looked at what the gain / loss is for both read and write speed and flexibility.
In cases where the speed gain for one action was significantly bigger than the speed loss for the other action, we have decided in favour of the speed gain.
In cases where the speed gained by one side is about equal to the speed lost by the other side, we have decided in favour of increased read speed. We have done that for the two following reasons:
First, we assume that on average RION messages will be read the same or more times than they will be written. For example, you could write tabular data (similar to a CSV file) into RION files, and then have to read them those files many times again later. This could the case with data files as well as with log files (sometimes at least).
This is also true of systems that route RION messages between a sender and final receiver. An RION message can be read as a single, opaque block of bytes and thus forwarded really fast for intermediate nodes that don't need to process the data in the message.
Second, RION write speed is already higher than the RION read speed, so by deciding on the side of read speed in 50-50 cases, the speed difference between the two formats are evened out a bit.
In addition to being fast to read and write, RION was designed to be compact in serialized form. A compact data format can be transmitted faster over the network.
You might claim that compactness is not that important because you can just ZIP compress the data sent over the network. Textual data formats like JSON and XML compress quite well, so the actual difference in size of the data transmitted would be a lot smaller if JSON, XML and RION were all ZIP compressed.
However, if you send compressed data over a TSL connection (encrypted connection), your data communication might very well be vulnerable to the BREACH and CRIME attacks. Therefore it is currently (Nov. 2015) recommended to turn off compression when sending data over a TSL connection. Then, all of a sudden data compactness matters again.
The compression-over-TSL problem will probably be solved in the future. But even when it does, a compact data format is still an advantage, although a smaller one, as long as this compactness does not impact performance too much.
On average RION objects are 10-20% smaller than the corresponding JSON messages. How much exactly depends on what data is being sent. For instance, larger integers take more characters to encode as text than smaller integers. The same is true in RION.
Sending lists of objects over the network is a common use case. When serializing an array of objects to JSON, each object is serialized as property name + property value pairs. That means that the property names are repeated for every object.
To avoid the repetition of property names when serializing arrays of the same type of objects, RION has a special table data type. The RION table data type only contains the property names once. After the property names the property values of all objects in the array are included in the same sequence as the property names. RION tables are thus similar in structure to CSV files with a single header row.
Including the property names only once makes RION tables much more compact than JSON object arrays. We have seen data sizes of less than 1/3 of their JSON counterparts. Exactly how much you save depends on the length of the property names.
RION tables are faster to write because you don't have to write the property names more than once. RION tables are also faster to read because the property values can be mapped to properties in the Java objects using an index rather than a property name. Using the index of the property value saves the reading of a property name + a hash table lookup per property. And being more compact, RION tables are also faster to transmit over a network.
We wanted RION files / objects to be self describing. It should be possible to parse an RION file / object without having a schema for it, in the same way you can with a JSON file. This is possible with RION. You may not be able to see the exact semantic meaning of the data being transmitted, but you can see what fields and data types an RION object contains without the use of a schema.
Since RION is to be used in IAP, a message oriented network protocol, it was naturally important that RION messages are easy to route for intermediary nodes. Since RION messages are self describing it is easy to see when an RION message starts and ends. It is also easy to read an RION message partially, or wrap it in another RION message for tunneling.
Easy to Handle For Servers
RION should be easy to handle for servers that receive massive amounts of messages. By "handle" we refer to a few different aspects of server design.
First of all it should be easy to know when a message starts and when a full message has been received without having to look at the whole message. This is easily possible with RION messages.
Second it should be easy to allocate the correct amount of memory for an RION message. Furthermore, an RION message should be fully containable in a single contiguous memory area. This makes it faster / easier to allocate and deallocate memory for the message, and faster to process the message too (the whole message might fit into the L1, L2 or L3 caches of the server).
Third, it should be possible to read only part of a message without having to read the full message. Reading a message partially makes it easier to implement a multi-step message processing pipeline where each step parses more and more of the message, and pass it on the correct subsystem in the server. This is also reasonably easy to do with RION messages.
Easy to Handle For Small Devices
A network protocol targeting small devices, like Internet of Things (IoT) should have a data and message format that is easy to handle for small devices too. Not just for big servers. Small data sizes, message sizes, fast read and write times as well as easy memory management are key for small devices.