Embedded Kafka: KTable+KTable leftJoin produces duplicate records

五迷三道 提交于 2019-12-02 08:16:10

问题


I come seeking knowledge of the arcane.

First, I have two pairs of topics, with one topic in each pair feeding into the other topic. Two KTables are being formed by the latter topics, which are used in a KTable+KTable leftJoin. Problem is, the leftJoin producing THREE records when I produce a single record to either KTable. I would expect two records in the form (A-null, A-B) but instead I get (A-null, A-B, A-null). I have confirmed that the KTables are receiving a single record each.

I have fiddled with the CACHE_MAX_BYTES_BUFFERING_CONFIG to enable/disable state store caching. The behavior above is with CACHE_MAX_BYTES_BUFFERING_CONFIG set to 0. When I use the default value for CACHE_MAX_BYTES_BUFFERING_CONFIG I see the following records output from the join: (A-B, A-B, A-null)

Here are the configurations for streams, consumers, producers:

properties.put(StreamsConfig.APPLICATION_ID_CONFIG, appName);
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapUrls);
properties.put(StreamsConfig.STATE_DIR_CONFIG, String.format("/tmp/kafka-streams/%s/%s",
properties.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0); // fiddled with
properties.put(StreamsConfig.CLIENT_ID_CONFIG, appName);
properties.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1000);
properties.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 1);
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
properties.put(ConsumerConfig.GROUP_ID_CONFIG, appName);
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.class
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.cla
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);

The Processor API code (sanitized) that experiences this behavior is below, notice the topics paired [A1, A2] and [B1, B2]:

    KTable<Long, Value> kTableA =
        kstreamBuilder.table(longSerde, valueSerde, topicA2);

    kstreamBuilder.stream(keySerde, envelopeSerde, topicA1)
        .to(longSerde, valueSerde, topicA2);

    kstreamBuilder.stream(keySerde, envelopeSerde, topicB1)
        .to(longSerde, valueSerde, topicB2.topicName);

    KTable<Long, Value> kTableB =
        kstreamBuilder.table(longSerde, valueSerde, topicB2.topicName);

    KTable<Long, Result> joinTable = kTableA.leftJoin(kTableB, (a,b) -> {
        // value joiner called three times with only a single record input
        // into topicA1 and topicB1
    });

    joinTable.groupBy(...)
    .aggregate(...)
    .to(longSerde, aggregateSerde, outputTopic);

Thanks in advance for any and all help, oh benevolent ones.

Update: I was running with one kafka server and 1 partition per topic and experienced this behavior. When I increased the number of servers to 2 and number of partitions to 3, my output becomes (A-null).

It seems to me I need to spent some more time with the kafka manual...

来源:https://stackoverflow.com/questions/51407542/embedded-kafka-ktablektable-leftjoin-produces-duplicate-records

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!