Tech and Media Labs
This site uses cookies to improve the user experience.




IAP - ION Design Goals

Jakob Jenkov
Last update: 2015-11-27

The IAP Object Notation is a binary data format which is flexible enough to encode a wide variety of data. The abbreviation of IAP Object Notation would be IAP-ON - which we have further shortened to ION. The name ION is simply easier to write and pronounce.

When designing ION we wanted ION to be:

  • Independent of the network protocol
  • Expressive
  • Fast
  • Compact
  • Self describing
  • Routable
  • Easy to handle for servers
  • Easy to handle for small devices

Independent of the Network Protocol

While ION was designed as part of IAP, ION is a data format that is independent of network protocols. Thus, you can use ION outside of IAP as an alternative data format to JSON, XML, YAML etc. As ION is reasonably compact and fast, using ION over HTTP might be a first step for organizations looking to switch to IAP from e.g. HTTP/JSON, SOAP/XML etc.

You could even use ION as a file format. As you will see later, ION is a pretty good alternative to CSV files. You could also use ION as a log file format. It would be pretty fast to scan through ION records in a file.

Expressive

ION was designed from the beginning to be very expressive. The less users would need to resort to other data formats the better. If we can all just exchange data in ION, we can use the same data parsers and generators. ION is capable of expressing everything that you can express in CSV, JSON and XML. That means that you could actually convert a CSV, JSON or XML file to an ION file without losing information.

Yes, that is also what they said about XML, but for XML it turned out not to work out. Textual data formats are naturally bad fits for sending binary data like numbers and files. We believe ION changes that, so now it is our turn to commit hubris with ION by claiming it as a general purpose data format.

ION is designed to be able to model these commonly used data structures:

  • Binary file / stream
  • Stream of fields (unbounded)
  • Array of fields (bounded)
  • Map (key, value pairs)
  • Objects with properties (key, value pairs)
  • Object graphs with objects inside objects etc. (like JSON)
  • Tables (like CSV files)
  • Objects (elements), text and binary fields mixed (like XML)

You can combine the different data structures. You can have object graphs inside tables, or tables inside object graphs, or mix tables, objects, text and binary data with each other.

Of course the built-in ION data types aren't right for every kind of data out there. In case you need to send something that ION does not support, you can just send it as raw bytes (which ION supports). This has been a priority from the beginning - that users can default to opaque byte sequences when transmitting data that ION is not explicitly designed to encode. It is also possible to define your own object types. More on that later.

Fast

Another important design goal of ION is performance. When we were anyways reinventing a data format, why not try to make it as fast as possible? We have tried that with ION, and our initial measurements look promising.

We have implemented a toolkit for working IAP called IAP Tools. This will be open sourced when it is more feature complete. Our current performance measurements are based on the performance of IAP Tools. To compare ION performance to JSON we have used the Jackson JSON parser which is one of the fastest JSON parsers out there.

Being a binary format ION is naturally faster to read and write than textual formats. Booleans, integers, floating points and binary data is faster to read and write from a binary form than a textual form. We have seen performance improvements of up to 1000% (x 10) compared to reading and writing the same Java objects from / to JSON with Jackson. On average though, expect a speed increase somewhere between 50 to 200%.

The speed improvement depends on the type and size of the data being serialized. The speed difference so far seems so be largest with small objects and types that don't serialize so well to text, like boolean and floating point variables. Jackson is pretty fast at serializing integers, so there the speed improvement is somewhere between 0 and 50% on average.

The exception is when reading and writing text - in which case ION should perform about the same as textual formats like JSON and XML. But even with text IAP Tools have some built-in classes that can make it faster to read and write text. These techniques could also be used in a JSON parser - but they don't seem to be so far.

Read Speed vs Write Speed

In a few cases we have had to make trade-offs between read speed / flexibility and write speed / flexibility. In these situations we have typically looked at what the gain / loss is for both read and write speed and flexibility.

In cases where the speed gain for one action was significantly bigger than the speed loss for the other action, we have decided in favour of the speed gain.

In cases where the speed gained by one side is about equal to the speed lost by the other side, we have decided in favour of increased read speed. We have done that for the two following reasons:

First, we assume that on average ION messages will be read the same or more times than they will be written. For example, you could write tabular data (similar to a CSV file) into ION files, and then have to read them those files many times again later. This could the case with data files as well as with log files (sometimes at least).

This is also true of systems that route ION messages between a sender and final receiver. An ION message can be read as a single, opaque block of bytes and thus forwarded really fast for intermediate nodes that don't need to process the data in the message.

Second, ION write speed is already higher than the ION read speed, so by deciding on the side of read speed in 50-50 cases, the speed difference between the two formats are evened out a bit.

Compact

In addition to being fast to read and write, ION was designed to be compact in serialized form. A compact data format can be transmitted faster over the network.

You might claim that compactness is not that important because you can just ZIP compress the data sent over the network. Textual data formats like JSON and XML compress quite well, so the actual difference in size of the data transmitted would be a lot smaller if JSON, XML and ION were all ZIP compressed.

However, if you send compressed data over a TSL connection (encrypted connection), your data communication might very well be vulnerable to the BREACH and CRIME attacks. Therefore it is currently (Nov. 2015) recommended to turn off compression when sending data over a TSL connection. Then, all of a sudden data compactness matters again.

The compression-over-TSL problem will probably be solved in the future. But even when it does, a compact data format is still an advantage, although a smaller one, as long as this compactness does not impact performance too much.

On average ION objects are 10-20% smaller than the corresponding JSON messages. How much exactly depends on what data is being sent. For instance, larger integers take more characters to encode as text than smaller integers. The same is true in ION.

Sending lists of objects over the network is a common use case. When serializing an array of objects to JSON, each object is serialized as property name + property value pairs. That means that the property names are repeated for every object.

To avoid the repetition of property names when serializing arrays of the same type of objects, ION has a special table data type. The ION table data type only contains the property names once. After the property names the property values of all objects in the array are included in the same sequence as the property names. ION tables are thus similar in structure to CSV files with a single header row.

Including the property names only once makes ION tables much more compact than JSON object arrays. We have seen data sizes of less than 1/3 of their JSON counterparts. Exactly how much you save depends on the length of the property names.

ION tables are faster to write because you don't have to write the property names more than once. ION tables are also faster to read because the property values can be mapped to properties in the Java objects using an index rather than a property name. Using the index of the property value saves the reading of a property name + a hash table lookup per property. And being more compact, ION tables are also faster to transmit over a network.

Self Describing

We wanted ION files / objects to be self describing. It should be possible to parse an ION file / object without having a schema for it, in the same way you can with a JSON file. This is possible with ION. You may not be able to see the exact semantic meaning of the data being transmitted, but you can see what fields and data types an ION object contains without the use of a schema.

Routable

Since ION is to be used in IAP, a message oriented network protocol, it was naturally important that ION messages are easy to route for intermediary nodes. Since ION messages are self describing it is easy to see when an ION message starts and ends. It is also easy to read an ION message partially, or wrap it in another ION message for tunneling.

Easy to Handle For Servers

ION should be easy to handle for servers that receive massive amounts of messages. By "handle" we refer to a few different aspects of server design.

First of all it should be easy to know when a message starts and when a full message has been received without having to look at the whole message. This is easily possible with ION messages.

Second it should be easy to allocate the correct amount of memory for an ION message. Furthermore, an ION message should be fully containable in a single contiguous memory area. This makes it faster / easier to allocate and deallocate memory for the message, and faster to process the message too (the whole message might fit into the L1, L2 or L3 caches of the server).

Third, it should be possible to read only part of a message without having to read the full message. Reading a message partially makes it easier to implement a multi-step message processing pipeline where each step parses more and more of the message, and pass it on the correct subsystem in the server. This is also reasonably easy to do with ION messages.

Easy to Handle For Small Devices

A network protocol targeting small devices, like Internet of Things (IoT) should have a data and message format that is easy to handle for small devices too. Not just for big servers. Small data sizes, message sizes, fast read and write times as well as easy memory management are key for small devices.

Jakob Jenkov




Copyright  Jenkov Aps
Close TOC