apache-kafka-streams

Tombstone messages not removing record from KTable state store?

会有一股神秘感。 提交于 2019-12-30 10:33:49
问题 I am creating KTable processing data from KStream. But when I trigger a tombstone messages with key and null payload, it is not removing message from KTable. sample - public KStream<String, GenericRecord> processRecord(@Input(Channel.TEST) KStream<GenericRecord, GenericRecord> testStream, KTable<String, GenericRecord> table = testStream .map((genericRecord, genericRecord2) -> KeyValue.pair(genericRecord.get("field1") + "", genericRecord2)) .groupByKey() reduce((genericRecord, v1) -> v1,

Kafka Streams and RPC: is calling REST service in map() operator considered an anti-pattern?

亡梦爱人 提交于 2019-12-29 06:58:27
问题 The naive approach for implementing the use case of enriching an incoming stream of events stored in Kafka with reference data - is by calling in map() operator an external service REST API that provides this reference data, for each incoming event. eventStream.map((key, event) -> /* query the external service here, then return the enriched event */) Another approach is to have second events stream with reference data and store it in KTable that will be a lightweight embedded "database" then

Kafka streams word count application

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-25 12:25:13
问题 I'm playing around with the kafka streaming API (Kakfa version: 0.10.2.0) trying to make a simple wordcount example work: Wordcount App gist. I'm running both producer and console consumer: ./kafka-console-producer.sh -topic input-topic --broker-list localhost:9092 ./kafka-console-consumer.sh --topic output-topic --bootstrap-server localhost:9092 --from-beginning start the application and everything seems to be working fine but when I type in some strings within the console producer, the

How to write only final output of KStreams windowed operation?

核能气质少年 提交于 2019-12-25 09:27:35
问题 Say, I need to do wordcount like processing but for every 5 minutes. So i am using tumbling windows, but in the output what i see is the intermittent changelog counts also. I want to see only the final counts for the window in the output. Is there a way to achieve this. 来源: https://stackoverflow.com/questions/44921281/how-to-write-only-final-output-of-kstreams-windowed-operation

Kafka streams - First example WordCount doesn't count correctly the first lap

╄→гoц情女王★ 提交于 2019-12-25 09:09:02
问题 I'm studying Kafka Streams and I have a problem with the first example of WordCount in Java 8, taken from the documentation. Using the latest available versions of kafka streams, Kafka Connect and WordCount lambda expressions example. I follow the following steps: I create an input topic in Kafka, and an output one. Start the app streaming and then uploading the input topic by inserting some words from a .txt file On the first count, in the output topic I see the words grouped correctly, but

2 step windowed aggregation with Kafka Streams DSL

℡╲_俬逩灬. 提交于 2019-12-25 08:59:07
问题 Suppose I have a stream "stream-1" consisting of 1 datapoint every second and I'd like to calculate a derived stream "stream-5" which contains the sum using a hopping window of 5 seconds and another stream "stream-10" which is based off "stream-5" containing the sum using a hopping window of 10 seconds. The aggregation needs to be done for each key separately and I'd like to be able to run each step in a different process. It is not a problem in itself if stream-5 and stream-10 contain

Is it possible to get exactly once processing with Spring Cloud Stream?

自作多情 提交于 2019-12-25 01:50:26
问题 Currently I'm using SCS with almost default configuration for sending and receiving message between microservices. Somehow I've read this https://www.confluent.io/blog/enabling-exactly-kafka-streams and wonder that it is gonna works or not if we just put the property called "processing.guarantee" with value "exactly-once" there through properties in Spring boot application ? 回答1: In the context of your question you should look at Spring Cloud Stream as just a delegate between target system (e

Is scalability applicable with Kafka stream if each topic has single partition

只愿长相守 提交于 2019-12-25 01:27:56
问题 My understanding as per Kafka stream documentation, Maximum possible parallel tasks is equal to maximum number of partitions of a topic among all topics in a cluster. I have around 60 topics at Kafka cluster. Each topic has single partition only. Is it possible to achieve scalability/parallelism with Kafka stream for my Kafka cluster? 回答1: Do you want to do the same computation over all topics? For this, I would recommend to introduce an extra topic with many partitions that you use to scale

Delete entire topic from State Store

爱⌒轻易说出口 提交于 2019-12-25 00:05:26
问题 I'm trying to delete an entire topic and a single record in a topic from the stateStore (Not at the same time) based on the key provided. So if key that I passed in equals a key in the stream, turn that key's value to null. StreamsBuilder streamsBuilder = new StreamsBuilder(); KStream kStream = streamsBuilder.stream( inputTopic, Consumed.with(Serdes.String(), /* key serde */ Serdes.ByteArray() /* value serde */)) .map((key1, value) -> { .map((key1, value) -> { if(key1.equals(key)){ return new

Cannot deserialize instance Kafka Streams

こ雲淡風輕ζ 提交于 2019-12-24 22:42:42
问题 What am I doing wrong, My below kafka stream program giving issue while streaming the data, "Cannot deserialize instance of com.kafka.productiontest.models.TimeOff out of START_ARRAY token ". I have a topic timeOffs2 which contain time offs information with key timeOffID and value is of type object which contain employeeId. I just want to group all time offs for employee key and write to the store. For store key will be employeeId and value will be list of timeoffs. Program properties and