apache-kafka-streams

How change Kafka committed consumer offset with required offset

烂漫一生 提交于 2019-12-24 06:33:31
问题 I have Kafka Stream application. My application is processing the events successfully. How to change Kafka committed consumer offset with required offset to reprocess/skip the events. I tried How to change start offset for topic?. But I got 'Node does not exist:' error. Please help me. 回答1: The question/answer you are referring to is based on an older Kafka version. Since Kafka 0.9, offsets are not committed to ZooKeeper but store in a special Kafka topic called the offset topic (topic name

How to Process a kafka KStream and write to database directly instead of sending it another topic

三世轮回 提交于 2019-12-24 01:55:09
问题 I don't want to write processed KStream to another topic, I directly want to write enriched KStream to database. How should I proceed? 回答1: You can implement a custom Processor that opens a DB connection and apply it via KStream#process() . Cf. https://docs.confluent.io/current/streams/developer-guide.html#applying-processors-and-transformers-processor-api-integration Note, you will need to do sync writes into your DB to guard against data loss. Thus, not writing back to a topic has multiple

KafkaStreams EXACTLY_ONCE guarantee - skipping kafka offsets

你。 提交于 2019-12-24 00:59:32
问题 I'm using Spark 2.2.0 and kafka 0.10 spark-streaming library to read from topic filled with Kafka-Streams scala application. Kafka Broker version is 0.11 and Kafka-streams version is 0.11.0.2. When i set EXACTLY_ONCE guarantee in Kafka-Stream app: p.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE) i get this error in Spark: java.lang.AssertionError: assertion failed: Got wrong record for spark-executor-<group.id> <topic> 0 even after seeking to offset 24 at scala

Aggregation over a specific partition in Apache Kafka Streams

拥有回忆 提交于 2019-12-24 00:22:35
问题 Lets say I have a Kafka topic named SensorData to which two sensors S1 and S2 are sending data (timestamp and value) to two different partitions e.g. S1 -> P1 and S2 -> P2. Now I need to aggregate the values for these two sensors separately, lets say calculating the average sensor value over a time window of 1 hour and writing it into a new topic SensorData1Hour . With this scenario How can I select a specific topic partition using the KStreamBuilder#stream method? Is it possible to apply

Kafka Streams Internal Data Management

天大地大妈咪最大 提交于 2019-12-23 21:26:31
问题 In my company, we are using Kafka extensively, but we have been using relational database to store results of several intermediary transformations and aggregations for fault tolerance reasons. Now we are exploring Kafka Streams as a more natural way to do this. Often, our needs are quite simple - one such case is Listen to an input queue of <K1,V1>, <K2,V2>, <K1,V2>, <K1,V3>... For each record, perform some high latency operation (call a remote service) If by the time <K1,V1> is processed,

Filtering between topics

非 Y 不嫁゛ 提交于 2019-12-23 04:39:08
问题 I have 1,000 records in a topic. I am trying to filter the records from the input topic to the output topic based on the Salary. For Example: I want the records of people whose salary is higher than 30,000. I am trying to use KSTREAMS using Java for this. The records are in text format(Comma Seperated), example: first_name, last_name, email, gender, ip_address, country, salary Redacted,Tranfield,user@example.com,Female,45.25.XXX.XXX,Russia,$12345.01 Redacted,Merck,user@example.com,Male,236

kafka-streams join produce duplicates

六眼飞鱼酱① 提交于 2019-12-23 03:44:14
问题 I have two topics: // photos {'id': 1, 'user_id': 1, 'url': 'url#1'}, {'id': 2, 'user_id': 2, 'url': 'url#2'}, {'id': 3, 'user_id': 2, 'url': 'url#3'} // users {'id': 1, 'name': 'user#1'}, {'id': 1, 'name': 'user#1'}, {'id': 1, 'name': 'user#1'} I create map photo by user KStream<Integer, Photo> photo_by_user = ... photo_by_user.to("photo_by_user") Then, I try to join two tables: KTable<Integer, User> users_table = builder.table("users"); KTable<Integer, Photo> photo_by_user_table = builder

state store may have migrated to another instance

痞子三分冷 提交于 2019-12-23 03:22:50
问题 when i try to access state stere from stream, am getting below error, The state store, count-store, may have migrated to another instance when i tried to access ReadOnlyKeyValueStore from store, getting erorr message as migrated to other server. but am having only one broker is up and running /** * */ package com.ms.kafka.com.ms.stream; import java.util.Properties; import java.util.stream.Stream; import org.apache.kafka.common.serialization.Serdes; import org.apache.kafka.streams.KafkaStreams

kafka KStream - topology to take n-second counts

你说的曾经没有我的故事 提交于 2019-12-23 01:26:06
问题 I have a stream of JSON objects that Im keying on a hash of a few values. Im hoping to count by key in n-second (10? 60?) intervals and use these values to do some pattern analysis. My topology: K->aggregateByKey(n seconds)->process() In the process - init() step Ive called ProcessorContent.schedule(60 * 1000L) in hopes of having the .punctuate() get called. From here I would loop through the values in an internal hash and act accordingly. Im seeing values come through the aggregation step

Is it ok to use Apache Kafka “infinite retention policy” as a base for an Event sourced system with CQRS?

倾然丶 夕夏残阳落幕 提交于 2019-12-22 11:37:33
问题 I'm currently evaluating options for designing/implementing Event Sourcing + CQRS architectutral approach to system design. Since we want to use Apache Kafka for other aspects (normal pubsub messaging + stream processing), the next logical question would be, "Can we use the Apache Kafka store as event store for CQRS"?, or more importantly would that be a smart desition? Right now I'm unsure about this. This source seems to support it: https://www.confluent.io/blog/okay-store-data-apache-kafka