apache-kafka-streams | 易学教程

Can I run Kafka Streams Application on the same machine as of Kafka Broker?

阅读更多关于 Can I run Kafka Streams Application on the same machine as of Kafka Broker?

问题 I have a Kafka Streams Application which takes data from few topics and joins the data and puts it in another topic. Kafka Configuration: 5 kafka brokers Kafka Topics - 15 partitions and 3 replication factor. Note: I am running Kafka Streams Applications on the same machines where my Kafka Brokers are running. Few millions of records are consumed/produced every hour. Whenever I take any kafka broker down, it goes into rebalancing and it takes approx. 30 minutes or sometimes even more for

Difference between idempotence and exactly-once in Kafka Stream

阅读更多关于 Difference between idempotence and exactly-once in Kafka Stream

问题 I was going through document what I understood we can achieve exactly-once transaction with enabling idempotence=true idempotence: The Idempotent producer enables exactly once for a producer against a single topic. Basically each single message send has stonger guarantees and will not be duplicated in case there's an error So if already we have idempotence then why we need another property exactly-once in Kafka Stream? What exactly different between idempotence vs exactly-once Why exactly

Kafka Streams - updating aggregations on KTable

阅读更多关于 Kafka Streams - updating aggregations on KTable

问题 I have a KTable with data that looks like this (key => value), where keys are customer IDs, and values are small JSON objects containing some customer data: 1 => { "name" : "John", "age_group": "25-30"} 2 => { "name" : "Alice", "age_group": "18-24"} 3 => { "name" : "Susie", "age_group": "18-24" } 4 => { "name" : "Jerry", "age_group": "18-24" } I'd like to do some aggregations on this KTable , and basically keep a count of the number of records for each age_group . The desired KTable data

How to process a KStream in a batch of max size or fallback to a time window?

阅读更多关于 How to process a KStream in a batch of max size or fallback to a time window?

问题 I would like to create a Kafka stream-based application that processes a topic and takes messages in batches of size X (i.e. 50) but if the stream has low flow, to give me whatever the stream has within Y seconds (i.e. 5). So, instead of processing messages one by one, I process a List[Record] where the size of the list is 50 (or maybe less). This is to make some I/O bound processing more efficient. I know that this can be implemented with the classic Kafka API but was looking for a stream

Kafka Stream groupBy behavior: many intermediate outputs/updates for an aggregation

阅读更多关于 Kafka Stream groupBy behavior: many intermediate outputs/updates for an aggregation

问题 I'm trying to play with Kafka Stream to aggregate some attribute of People. I have a kafka stream test like this : new ConsumerRecordFactory[Array[Byte], Character]("input", new ByteArraySerializer(), new CharacterSerializer()) var i = 0 while (i != 5) { testDriver.pipeInput( factory.create("input", Character(123,12), 15*10000L)) i+=1; } val output = testDriver.readOutput.... I'm trying to group the value by key like this : streamBuilder.stream[Array[Byte], Character](inputKafkaTopic) .filter

Kafka Streams RoundRobinPartitioner

阅读更多关于 Kafka Streams RoundRobinPartitioner

问题 I wrote a kafka streams code that uses kafka 2.4 kafka client version and kafka 2.2 server version. I have 50 partition on my topic & internal topic. My kafka stream code has selectKey() DSL operation and I have 2 million of record using same KEY. In the stream configuration, I have done props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, RoundRobinPartitioner.class); So that I am able to use different partitions with the exactly same key. If I dont use Round Robin as expected all my messages

Kafka Streams RoundRobinPartitioner

阅读更多关于 Kafka Streams RoundRobinPartitioner

How to send final kafka-streams aggregation result of a time windowed KTable?

阅读更多关于 How to send final kafka-streams aggregation result of a time windowed KTable?

问题 What I'd like to do is this: Consume records from a numbers topic (Long's) Aggregate (count) the values for each 5 sec window Send the FINAL aggregation result to another topic My code looks like this: KStream<String, Long> longs = builder.stream( Serdes.String(), Serdes.Long(), "longs"); // In one ktable, count by key, on a five second tumbling window. KTable<Windowed<String>, Long> longCounts = longs.countByKey(TimeWindows.of("longCounts", 5000L)); // Finally, sink to the long-avgs topic.

Reading peek topic from kafka streams

阅读更多关于 Reading peek topic from kafka streams

问题 I have a topic name which is push-processing-KSTREAM-PEEK-0000000014-repartition and this is internal topic by kafka. I did not create this topicc and I am using .peek() method after repartition and using peek method 3-4 times. My question is I can read from topic "topic read push-processing-KSTREAM-PEEK-0000000014-repartition" but I can not read when I say "topic read push-processing-KSTREAM-PEEK-0000000014-repartition --from-beginning". This internal topic is created because of peek method

How to implement Generic Kafka Streams Deserializer

阅读更多关于 How to implement Generic Kafka Streams Deserializer

问题 I like Kafka, but hate having to write lots of serializers/deserializers, so I tried to create a GenericDeserializer<T> that could deserialize a generic type T. Here's my attempt: class GenericDeserializer< T > implements Deserializer< T > { static final ObjectMapper objectMapper = new ObjectMapper(); @Override public void configure(Map<String, ?> configs, boolean isKey) { } @Override public T deserialize( String topic, byte[] data) { T result = null; try { result = ( T )( objectMapper