apache-kafka-streams

Is it a good practice to do sync database query or restful call in Kafka streams jobs?

∥☆過路亽.° 提交于 2019-12-19 05:00:45
问题 I use Kafka streams to process real-time data, in the Kafka streams tasks, I need to access MySQL to query data, and need to call another restful service. All the operations are synchronous. I'm afraid the sync call will reduce the process capability of the streams tasks. Is this a good practice? or Is there any good idea to do this? 回答1: A better way to do it would be to stream your MySQL table(s) into Kafka, and access the data there. This has the advantage of decoupling your streams app

Is it a good practice to do sync database query or restful call in Kafka streams jobs?

北城余情 提交于 2019-12-19 05:00:26
问题 I use Kafka streams to process real-time data, in the Kafka streams tasks, I need to access MySQL to query data, and need to call another restful service. All the operations are synchronous. I'm afraid the sync call will reduce the process capability of the streams tasks. Is this a good practice? or Is there any good idea to do this? 回答1: A better way to do it would be to stream your MySQL table(s) into Kafka, and access the data there. This has the advantage of decoupling your streams app

End-of-window outer join with KafkaStreams

自闭症网瘾萝莉.ら 提交于 2019-12-19 03:38:16
问题 I have a Kafka topic where I expect messages with two different key types: old and new. i.e. "1-new" , "1-old" , "2-new" , "2-old" . Keys are unique, but some might be missing. Now using Kotlin and KafkaStreams API I can log those messages with have same key id from new and old. val windows = JoinWindows.of(Duration.of(2, MINUTES).toMillis()) val newStream = stream.filter({ key, _ -> isNew(key) }) .map({key, value -> KeyValue(key.replace(NEW_PREFIX, ""), value) }) val oldStream = stream

How to output result of windowed aggregation only when window is finished? [duplicate]

我们两清 提交于 2019-12-19 02:20:36
问题 This question already has answers here : How to send final kafka-streams aggregation result of a time windowed KTable? (2 answers) Closed 12 months ago . I have a KStream in which I want to count some dimension of the events. I do it as follows: KTable<Windowed<Long>, Counter> ret = input.groupByKey() .windowedBy(TimeWindows.of(Duration.of(10, SECONDS))) .aggregate(Counter::new, (k, v, c) -> new Counter(c.count + v.getDimension())); I want to have a new KStream with those aggregations as

Kafka Streams thread number

风流意气都作罢 提交于 2019-12-18 08:56:28
问题 I am new to Kafka Streams, I am currently confused with the maximum parallelism of Kafka Streams application. I went through following link and did not get the answer what I am trying to find. https://docs.confluent.io/current/streams/faq.html#streams-faq-scalability-maximum-parallelism If I have 2 input topics, one have 10 partitions and the other have 5 partitions, and only one Kafka Streams application instance is running to process these two input topics, what is the maximum thread number

Kafka Streams thread number

痞子三分冷 提交于 2019-12-18 08:56:10
问题 I am new to Kafka Streams, I am currently confused with the maximum parallelism of Kafka Streams application. I went through following link and did not get the answer what I am trying to find. https://docs.confluent.io/current/streams/faq.html#streams-faq-scalability-maximum-parallelism If I have 2 input topics, one have 10 partitions and the other have 5 partitions, and only one Kafka Streams application instance is running to process these two input topics, what is the maximum thread number

UnsatisfiedLinkError on Lib rocks DB dll when developing with Kafka Streams

删除回忆录丶 提交于 2019-12-18 05:03:16
问题 I'm writing a Kafka Streams application on my development Windows machine. If I try to use the leftJoin and branch features of Kafka Streams I get the error below when executing the jar application: Exception in thread "StreamThread-1" java.lang.UnsatisfiedLinkError: C:\Users\user\AppData\Local\Temp\librocksdbjni325337723194862275.dll: Can't find dependent libraries at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941) at java

Kafka KStreams - processing timeouts

淺唱寂寞╮ 提交于 2019-12-18 04:09:42
问题 I am attempting to use <KStream>.process() with a TimeWindows.of("name", 30000) to batch up some KTable values and send them on. It seems that 30 seconds exceeds the consumer timeout interval after which Kafka considers said consumer to be defunct and releases the partition. I've tried upping the frequency of poll and commit interval to avoid this: config.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, "5000"); config.put(StreamsConfig.POLL_MS_CONFIG, "5000"); Unfortunately these errors are

What are the differences between KTable vs GlobalKTable and leftJoin() vs outerJoin()?

て烟熏妆下的殇ゞ 提交于 2019-12-18 03:32:36
问题 In Kafka Stream library, I want to know difference between KTable and GlobalKTable. Also in KStream class, there are two methods leftJoin() and outerJoin() . What is the difference between these two methods also? I read KStream.leftJoin, but did not manage to find an exact difference. 回答1: KTable VS GlobalKTable A KTable shardes the data between all running Kafka Streams instances, while a GlobalKTable has a full copy of all data on each instance. The disadvantage of GlobalKTable is that it

Kafka: Consumer API vs Streams API

北城余情 提交于 2019-12-17 15:07:10
问题 I recently started learning Kafka and end up with these questions. What is the difference between Consumer and Stream? For me, if any tool/application consume messages from Kafka is a consumer in the Kafka world. How Stream is different as this also consumes from or produce messages to Kafka? and why is it needed as we can write our own consumer application using Consumer API and process them as needed or send them to Spark from the consumer application? I did Google on this, but did not get