Tutorials     About     RSS
Tech and Media Labs

Nanosai Studio Data Sources

Jakob Jenkov
Last update: 2018-04-30

For Nanosai Studio to analyze and visualize data, it must obtain that data from a data source. In this text we will see what type of data sources Nanosai Studio can use, and what data formats Nanosai Studio understands.

Data Source Locations

Nanosai Studio can currently connect to data sources with these locations:

  • In a local file
  • A remote web service identified by a URL.

We intend to add more data source locations in the future. If you need a data source location that Nanosai Studio does not support, contact Nanosai. Perhaps we can help you.

Data Streams

Nanosai Studio considers the data provided by a data source to be a stream of records, objects or messages. Each record in a data stream must have an index. Nanosai Studio uses this index to track how much of the stream it has already analyzed. Thus, when Nanosai Studio connects to the data source the next time, it will only consume the data provided which has an index higher than the index of the latest record consumed. Here is an example:

If Nanosai Studio is connecting to a local file, and new records are added to this file from time to time, then Nanosai Studio can detect what part of that file it has not yet analyzed. For instance, if the file is read and the last index of the file is 99, then Nanosai Studio will attempt to start from index 100 the next time it reads the file. If the file has records with indexes from 100 and above, these records will then be analyzed, and the latest index of the file noted for next re-read of the file.

Data Formats

When a data source returns data to Nanosai Studio, it does not need to provide all the data in the total stream every time. The data source may return only part of the total data stream - typically the data that is new since last connection to the data source. A part of a total data stream is referred to as a stream interval. Sometimes a stream interval is also referred to as a "block", "batch" or "window" of records. The data returned by a data source must describe the boundaries of the interval it returns.

Currently Nanosai Studio only supports a single data format - and that is what we call a "CSV Stream" format. The format of a CSV Stream looks like this:


This little interval of a CSV stream consists of one streaming info line (first line) and N record lines. Each will be described in the following sections.

Streaming Info

The first line is the streaming info line. This is the line that describes the boundaries of the record interval returned by the data source. The streaming info line consists of three values:

  1. First index: The global index of the first record in the data in the CSV stream.
  2. Last index: The global index of the last record in the data in the CSV stream.
  3. Indexing scheme.

The first index must be the global index of the first record in the interval returned by the data source. In the above example the index of the first record is 0. If Nanosai Studio had already consumed 100 records with global indexes from 0 to 99, then the global index of the first record would have been 100.

The last index must be the global index of the last record in the interval returned by the data source. In the example above the last index is 4.

The indexing scheme value can take three possible values - 0, 1 or 2. These values will be explained below.

The indexing scheme value 0 means "implicitit indexing scheme". This means, that each record contains no explicit index, but that indexes start from the first index, and increase incrementally for each record (e.g. 0,1,2,3 etc.).

The indexing scheme value 1 means "relative indexing scheme". This means, that the very first value of each record (on each line of the CSV data) is an index that is relative to the first index of the data interval. Look at this example:


Notice that each of the data records (after the first line which is the streaming info line) contains an index. Notice also that the indexing scheme value in the streaming info line is set to 1. That means, that the indexes of each record are to be interpreted as being relative to the first index value as specified in the streaming info line. Thus, the first record's index of 0 means 20 + 0 = 20. The second record's index of 1 means 20 + 1 = 21 etc.

The indexing scheme value 2 means "absolute indexing scheme". This means that each record has an explicit index, and that the value is the actual global index of that record. Here is an example:


Data Records

The exact meaning of the data records depends on the semantics of the CSV stream. Nanosai Studio understands these record semantics:

[index,] data series ID, x value, y value
[index,] data series ID, value

The first record structure is used for X,Y type charts (e.g. a line diagram). The data series ID specifies what line the given record belongs to. Thus, you can show multiple lines in the same line chart by specifying different data series ID values for the records.

The second record structure is used for pie charts. The values for each data series ID are summed up, and visualized as one part of the pie chart.

The second record structure can also be used for bar charts, but we will probably have to redesign the record structure for bar charts, so keep that in mind.

Jakob Jenkov

Featured Videos

Sponsored Ads

Maildroppa - Smart Email Marketing Solution
Close TOC

All Trails

Trail TOC

Page TOC