apache-kafka-streams | 易学教程

How change Kafka committed consumer offset with required offset

阅读更多关于 How change Kafka committed consumer offset with required offset

问题 I have Kafka Stream application. My application is processing the events successfully. How to change Kafka committed consumer offset with required offset to reprocess/skip the events. I tried How to change start offset for topic?. But I got 'Node does not exist:' error. Please help me. 回答1: The question/answer you are referring to is based on an older Kafka version. Since Kafka 0.9, offsets are not committed to ZooKeeper but store in a special Kafka topic called the offset topic (topic name

How to Process a kafka KStream and write to database directly instead of sending it another topic

阅读更多关于 How to Process a kafka KStream and write to database directly instead of sending it another topic

问题 I don't want to write processed KStream to another topic, I directly want to write enriched KStream to database. How should I proceed? 回答1: You can implement a custom Processor that opens a DB connection and apply it via KStream#process() . Cf. https://docs.confluent.io/current/streams/developer-guide.html#applying-processors-and-transformers-processor-api-integration Note, you will need to do sync writes into your DB to guard against data loss. Thus, not writing back to a topic has multiple

KafkaStreams EXACTLY_ONCE guarantee - skipping kafka offsets

阅读更多关于 KafkaStreams EXACTLY_ONCE guarantee - skipping kafka offsets

问题 I'm using Spark 2.2.0 and kafka 0.10 spark-streaming library to read from topic filled with Kafka-Streams scala application. Kafka Broker version is 0.11 and Kafka-streams version is 0.11.0.2. When i set EXACTLY_ONCE guarantee in Kafka-Stream app: p.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE) i get this error in Spark: java.lang.AssertionError: assertion failed: Got wrong record for spark-executor-<group.id> <topic> 0 even after seeking to offset 24 at scala

Aggregation over a specific partition in Apache Kafka Streams

阅读更多关于 Aggregation over a specific partition in Apache Kafka Streams

问题 Lets say I have a Kafka topic named SensorData to which two sensors S1 and S2 are sending data (timestamp and value) to two different partitions e.g. S1 -> P1 and S2 -> P2. Now I need to aggregate the values for these two sensors separately, lets say calculating the average sensor value over a time window of 1 hour and writing it into a new topic SensorData1Hour . With this scenario How can I select a specific topic partition using the KStreamBuilder#stream method? Is it possible to apply

Kafka Streams Internal Data Management

阅读更多关于 Kafka Streams Internal Data Management

问题 In my company, we are using Kafka extensively, but we have been using relational database to store results of several intermediary transformations and aggregations for fault tolerance reasons. Now we are exploring Kafka Streams as a more natural way to do this. Often, our needs are quite simple - one such case is Listen to an input queue of <K1,V1>, <K2,V2>, <K1,V2>, <K1,V3>... For each record, perform some high latency operation (call a remote service) If by the time <K1,V1> is processed,

Filtering between topics

阅读更多关于 Filtering between topics

问题 I have 1,000 records in a topic. I am trying to filter the records from the input topic to the output topic based on the Salary. For Example: I want the records of people whose salary is higher than 30,000. I am trying to use KSTREAMS using Java for this. The records are in text format(Comma Seperated), example: first_name, last_name, email, gender, ip_address, country, salary Redacted,Tranfield,user@example.com,Female,45.25.XXX.XXX,Russia,$12345.01 Redacted,Merck,user@example.com,Male,236

kafka-streams join produce duplicates

阅读更多关于 kafka-streams join produce duplicates

问题 I have two topics: // photos {'id': 1, 'user_id': 1, 'url': 'url#1'}, {'id': 2, 'user_id': 2, 'url': 'url#2'}, {'id': 3, 'user_id': 2, 'url': 'url#3'} // users {'id': 1, 'name': 'user#1'}, {'id': 1, 'name': 'user#1'}, {'id': 1, 'name': 'user#1'} I create map photo by user KStream<Integer, Photo> photo_by_user = ... photo_by_user.to("photo_by_user") Then, I try to join two tables: KTable<Integer, User> users_table = builder.table("users"); KTable<Integer, Photo> photo_by_user_table = builder

state store may have migrated to another instance

阅读更多关于 state store may have migrated to another instance

问题 when i try to access state stere from stream, am getting below error, The state store, count-store, may have migrated to another instance when i tried to access ReadOnlyKeyValueStore from store, getting erorr message as migrated to other server. but am having only one broker is up and running /** * */ package com.ms.kafka.com.ms.stream; import java.util.Properties; import java.util.stream.Stream; import org.apache.kafka.common.serialization.Serdes; import org.apache.kafka.streams.KafkaStreams

kafka KStream - topology to take n-second counts

阅读更多关于 kafka KStream - topology to take n-second counts

问题 I have a stream of JSON objects that Im keying on a hash of a few values. Im hoping to count by key in n-second (10? 60?) intervals and use these values to do some pattern analysis. My topology: K->aggregateByKey(n seconds)->process() In the process - init() step Ive called ProcessorContent.schedule(60 * 1000L) in hopes of having the .punctuate() get called. From here I would loop through the values in an internal hash and act accordingly. Im seeing values come through the aggregation step

Is it ok to use Apache Kafka “infinite retention policy” as a base for an Event sourced system with CQRS?

阅读更多关于 Is it ok to use Apache Kafka “infinite retention policy” as a base for an Event sourced system with CQRS?

问题 I'm currently evaluating options for designing/implementing Event Sourcing + CQRS architectutral approach to system design. Since we want to use Apache Kafka for other aspects (normal pubsub messaging + stream processing), the next logical question would be, "Can we use the Apache Kafka store as event store for CQRS"?, or more importantly would that be a smart desition? Right now I'm unsure about this. This source seems to support it: https://www.confluent.io/blog/okay-store-data-apache-kafka