Stream Ops for Java is an embeddable data streaming toolkit implemented in Java. Stream Ops contains a set of components you can use to implement data streaming pipelines. The Stream Ops components are listed in the next section of this Stream Ops for Java tutorial.
Stream Ops GitHub Repository
You can find the Stream Ops for Java GitHub repository here:
Stream Ops Components
Stream Ops contains (or will contain) the following components:
- Data Streaming Engine
- Stream Processing API
The data streaming engine handles the writing of records into a data stream on disk, and later reading of those records again. Records can be read sequentially in the same order they were stored. You can iterate over all, or part of the records in the stream. The data streaming engine contains several components to help you iterate cleverly over the records in the stream, filter out the records you are not interested in, and even to only extract part of the fields of each record.
The stream processing API can process the records of a stream on a higher level. For instance, read all the records, convert them to objects, perform calculations and transformations, and finally output some other result. The stream processing API can be used independently of the data streaming engine. Thus, you could use the stream processing API with Kafka or other data streaming technologies.
Stream Ops is designed to be fully embeddable in your applications. By fully embeddable we mean that you can use Stream Ops internally in a desktop or mobile application, inside a web application, micro service, or other service requiring sequential data storage - typically systems using a variation of a CQRS design.
Stream Ops consists of a small set of small JAR files. Stream Ops was kept small on purpose to minimize the footprint's impact on mobile apps, desktop apps and micro services using Stream Ops.
Stream Ops vs. Kafka
Stream Ops is similar in functionality to Kafka, but the focus on making Stream Ops embeddable led to a different design philosophy. Where Kafka is more of a black box internally, Stream Ops opens up the data streaming engine so you can use it, or the parts of it, that matches your specific needs. This gives you a higher degree of flexibility to customize your data streaming pipeline.
The increased flexibility comes with a bit more work, though. You have to assemble your data streaming pipeline yourself. It is not just an download-and-unzip operation, like Confluent's Kafka distribution is. We are continuously trying to make the job of assembling a data streaming engine easier for you, but you will probably have to spend some more time understanding how the components work, in order to assemble them appropriately.
The Stream Ops data streaming engine covers the same functionality as the Kafka service that stores Kafka topics, and provides access to them for Kafka consumers. That is, the core Kafka storage engine.
The Stream Ops stream processing API is more similar to the Kafka Streams API, but with some significant changes to make it easier to work with, and more flexible in terms of what use cases it supports.