Apache Kafka is the most popular language of recent times. It has been going through many upgrades and enhancements to provide better performance. This article will be discussing one such additional feature. Confluent has announced a new feature called Interactive Queries for stream processing with Apache Kafka. It allows you to treat the stream processing layer as lightweight and directly query the stream processing engine. Apache kafka manages it and offer fault tolerance. This new feature enables the confluence of processing and storage into one single easy-to-use application.
This vision is moving stream processing out of big data and making it available as a mainstream application development model. This blog will focus on the motivation behind these Interactive Queries through various examples.
Example: Real-time risk management
In this example, consider a financial institution, for example a wealth management firm that maintains positions in assets possessed by the firm and its investors. The bank continuously collects business events and data that could influence the risk which is associated with these positions. Whenever the data changes, the risk positions are recalculated in order to keep a real-time view.
Real-time risk management is an example of stateful application. A state is required to keep a track of the latest positions, and is also required inside the stream processing to keep track of statistics. All collected states need to be upgraded and queried continuously.
How is this done?
In the risk management dashboard, business events would be captured as real-time data in Apache Kafka Streams. There are lot of moving parts and inefficiency in how things are done-
- An extra Hadoop cluster to reprocess data.
- Storage is maintained at the stream processing layer
- Storage and databases maintained from the streaming and Hadoop jobs.
- A record is written internally to maintain the computational state. This state is later duplicated.
- Locality is destroyed as the data that needs to be local is unnecessarily shipped to a storage cluster.
Case of interactive Queries
Let us now simplify the above development by just removing the Hadoop layer and then having all the process done in the streaming layer. For this we move lambda architecture to the Kappa architecture. Here we did even better by using Interactive Queries. With the help of Interactive Queries we are directly exposing this embedded state to applications. The embedded databases act as materialized views of logs and stored in Apache Kafka.
Materialized views provide better application isolation and better performance.
Selecting right database
Points to consider when selecting the right database and storage-
Pros of Interactive Queries with embedded databases:
- Very few moving points and you don’t have to deploy, maintain and operate an external database.
- It allows faster and efficient use of application state.
- It provides better isolation.
- It allows more flexibility.
Cons:
- You may have to move away from a database that you trust.
- You may have to scale storage independently.
- You will need customized queries which are specific to some database.
Whatever you choose, just remember that you get more flexibility with Apache Kafka.
Information queried interactively
Interactive Queries enables developers to query embedded state stores of a streaming app. These are read-only and no modifications are allowed. This is to avoid state inconsistencies. Allowing read-only access is sufficient for most of the applications that consume data from a queryable streaming application.
- Interactive Queries enable faster and efficient use of applications.
- There is no duplication of data.
How to make Apache Kafka streams applications queryable?
Apache Kafka streams handles low-level querying and offers fault tolerance, and thus you can query with zero work.
Querying local stores
- Start with a single app’s instance.
- Kafka stream will partition up the data among the instances.
Discovering any instances’ stores
- We need to make each instance aware of the other one through periodic metadata exchange.
- With Apache Kafka Streams each instance many expose its endpoint information metadata to other instances.
- New Interactive Query API allows a developer to obtain metadata.
- Now you can discover where the store is.
Conclusion
It doesn’t matter whether you are creating a core banking application or advertising any data pipeline, there will be a requirement to scale processing and make it real-time. To perform this, you need a Database that can do both. Apache Kafka provides you with the power of declarative API. Interactive Queries allows you to query data as it is being processed.
If you are looking for Apache Kafka services, Ksolves’ is the best choice for building your own real-time applications. We are one of the best Apache Kafka service providers across the globe offering customized Apache Kafka services with minimum latency. Write in the comment section for more details.
Contact Us for any Query
Email : sales@ksolves.com
Call : +91 8130704295
AUTHOR
Apache Kafka
Anil Kushwaha, Technology Head at Ksolves, is an expert in Big Data and AI/ML. With over 11 years at Ksolves, he has been pivotal in driving innovative, high-volume data solutions with technologies like Nifi, Cassandra, Spark, Hadoop, etc. Passionate about advancing tech, he ensures smooth data warehousing for client success through tailored, cutting-edge strategies.
Share with