问题
I come seeking knowledge of the arcane.
First, I have two pairs of topics, with one topic in each pair feeding into the other topic. Two KTables are being formed by the latter topics, which are used in a KTable+KTable leftJoin. Problem is, the leftJoin producing THREE records when I produce a single record to either KTable. I would expect two records in the form (A-null, A-B) but instead I get (A-null, A-B, A-null). I have confirmed that the KTables are receiving a single record each.
I have fiddled with the CACHE_MAX_BYTES_BUFFERING_CONFIG to enable/disable state store caching. The behavior above is with CACHE_MAX_BYTES_BUFFERING_CONFIG set to 0. When I use the default value for CACHE_MAX_BYTES_BUFFERING_CONFIG I see the following records output from the join: (A-B, A-B, A-null)
Here are the configurations for streams, consumers, producers:
properties.put(StreamsConfig.APPLICATION_ID_CONFIG, appName);
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapUrls);
properties.put(StreamsConfig.STATE_DIR_CONFIG, String.format("/tmp/kafka-streams/%s/%s",
properties.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0); // fiddled with
properties.put(StreamsConfig.CLIENT_ID_CONFIG, appName);
properties.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1000);
properties.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 1);
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
properties.put(ConsumerConfig.GROUP_ID_CONFIG, appName);
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.class
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.cla
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
The Processor API code (sanitized) that experiences this behavior is below, notice the topics paired [A1, A2] and [B1, B2]:
KTable<Long, Value> kTableA =
kstreamBuilder.table(longSerde, valueSerde, topicA2);
kstreamBuilder.stream(keySerde, envelopeSerde, topicA1)
.to(longSerde, valueSerde, topicA2);
kstreamBuilder.stream(keySerde, envelopeSerde, topicB1)
.to(longSerde, valueSerde, topicB2.topicName);
KTable<Long, Value> kTableB =
kstreamBuilder.table(longSerde, valueSerde, topicB2.topicName);
KTable<Long, Result> joinTable = kTableA.leftJoin(kTableB, (a,b) -> {
// value joiner called three times with only a single record input
// into topicA1 and topicB1
});
joinTable.groupBy(...)
.aggregate(...)
.to(longSerde, aggregateSerde, outputTopic);
Thanks in advance for any and all help, oh benevolent ones.
Update: I was running with one kafka server and 1 partition per topic and experienced this behavior. When I increased the number of servers to 2 and number of partitions to 3, my output becomes (A-null).
It seems to me I need to spent some more time with the kafka manual...
来源:https://stackoverflow.com/questions/51407542/embedded-kafka-ktablektable-leftjoin-produces-duplicate-records