apache-kafka-streams

Can I run Kafka Streams Application on the same machine as of Kafka Broker?

。_饼干妹妹 提交于 2020-01-23 13:28:09
问题 I have a Kafka Streams Application which takes data from few topics and joins the data and puts it in another topic. Kafka Configuration: 5 kafka brokers Kafka Topics - 15 partitions and 3 replication factor. Note: I am running Kafka Streams Applications on the same machines where my Kafka Brokers are running. Few millions of records are consumed/produced every hour. Whenever I take any kafka broker down, it goes into rebalancing and it takes approx. 30 minutes or sometimes even more for

Difference between idempotence and exactly-once in Kafka Stream

China☆狼群 提交于 2020-01-23 05:21:08
问题 I was going through document what I understood we can achieve exactly-once transaction with enabling idempotence=true idempotence: The Idempotent producer enables exactly once for a producer against a single topic. Basically each single message send has stonger guarantees and will not be duplicated in case there's an error So if already we have idempotence then why we need another property exactly-once in Kafka Stream? What exactly different between idempotence vs exactly-once Why exactly

Kafka Streams - updating aggregations on KTable

断了今生、忘了曾经 提交于 2020-01-22 15:04:34
问题 I have a KTable with data that looks like this (key => value), where keys are customer IDs, and values are small JSON objects containing some customer data: 1 => { "name" : "John", "age_group": "25-30"} 2 => { "name" : "Alice", "age_group": "18-24"} 3 => { "name" : "Susie", "age_group": "18-24" } 4 => { "name" : "Jerry", "age_group": "18-24" } I'd like to do some aggregations on this KTable , and basically keep a count of the number of records for each age_group . The desired KTable data

How to process a KStream in a batch of max size or fallback to a time window?

大兔子大兔子 提交于 2020-01-21 22:16:06
问题 I would like to create a Kafka stream-based application that processes a topic and takes messages in batches of size X (i.e. 50) but if the stream has low flow, to give me whatever the stream has within Y seconds (i.e. 5). So, instead of processing messages one by one, I process a List[Record] where the size of the list is 50 (or maybe less). This is to make some I/O bound processing more efficient. I know that this can be implemented with the classic Kafka API but was looking for a stream

Kafka Stream groupBy behavior: many intermediate outputs/updates for an aggregation

天大地大妈咪最大 提交于 2020-01-21 19:31:30
问题 I'm trying to play with Kafka Stream to aggregate some attribute of People. I have a kafka stream test like this : new ConsumerRecordFactory[Array[Byte], Character]("input", new ByteArraySerializer(), new CharacterSerializer()) var i = 0 while (i != 5) { testDriver.pipeInput( factory.create("input", Character(123,12), 15*10000L)) i+=1; } val output = testDriver.readOutput.... I'm trying to group the value by key like this : streamBuilder.stream[Array[Byte], Character](inputKafkaTopic) .filter

Kafka Streams RoundRobinPartitioner

跟風遠走 提交于 2020-01-21 18:57:51
问题 I wrote a kafka streams code that uses kafka 2.4 kafka client version and kafka 2.2 server version. I have 50 partition on my topic & internal topic. My kafka stream code has selectKey() DSL operation and I have 2 million of record using same KEY. In the stream configuration, I have done props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, RoundRobinPartitioner.class); So that I am able to use different partitions with the exactly same key. If I dont use Round Robin as expected all my messages

Kafka Streams RoundRobinPartitioner

我的梦境 提交于 2020-01-21 18:57:30
问题 I wrote a kafka streams code that uses kafka 2.4 kafka client version and kafka 2.2 server version. I have 50 partition on my topic & internal topic. My kafka stream code has selectKey() DSL operation and I have 2 million of record using same KEY. In the stream configuration, I have done props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, RoundRobinPartitioner.class); So that I am able to use different partitions with the exactly same key. If I dont use Round Robin as expected all my messages

How to send final kafka-streams aggregation result of a time windowed KTable?

有些话、适合烂在心里 提交于 2020-01-18 02:20:40
问题 What I'd like to do is this: Consume records from a numbers topic (Long's) Aggregate (count) the values for each 5 sec window Send the FINAL aggregation result to another topic My code looks like this: KStream<String, Long> longs = builder.stream( Serdes.String(), Serdes.Long(), "longs"); // In one ktable, count by key, on a five second tumbling window. KTable<Windowed<String>, Long> longCounts = longs.countByKey(TimeWindows.of("longCounts", 5000L)); // Finally, sink to the long-avgs topic.

Reading peek topic from kafka streams

混江龙づ霸主 提交于 2020-01-16 07:58:42
问题 I have a topic name which is push-processing-KSTREAM-PEEK-0000000014-repartition and this is internal topic by kafka. I did not create this topicc and I am using .peek() method after repartition and using peek method 3-4 times. My question is I can read from topic "topic read push-processing-KSTREAM-PEEK-0000000014-repartition" but I can not read when I say "topic read push-processing-KSTREAM-PEEK-0000000014-repartition --from-beginning". This internal topic is created because of peek method

How to implement Generic Kafka Streams Deserializer

扶醉桌前 提交于 2020-01-15 03:23:46
问题 I like Kafka, but hate having to write lots of serializers/deserializers, so I tried to create a GenericDeserializer<T> that could deserialize a generic type T. Here's my attempt: class GenericDeserializer< T > implements Deserializer< T > { static final ObjectMapper objectMapper = new ObjectMapper(); @Override public void configure(Map<String, ?> configs, boolean isKey) { } @Override public T deserialize( String topic, byte[] data) { T result = null; try { result = ( T )( objectMapper