KSQL Table-Table Left outer Join emit same join result more than once

后端 未结 1 1068
一个人的身影
一个人的身影 2021-01-22 09:05

using KSQL, and performing left outer join, i can see the result of my join sometime emitted more than once.

In other words, the same join result is emitted more than o

相关标签:
1条回答
  • 2021-01-22 09:50

    the general answer is yes. kafka is an at-least-once system. more specifically, a few scenarios can result in duplication:

    1. consumers only periodically checkpoint their positions. a consumer crash can result in duplicate processing of some range or records
    2. producers have client-side timeouts. this means the producer may think a request timed out and re-transmit while broker-side it actually succeeded.
    3. if you mirror data between kafka clusters thats usually done with a producer + consumer pair of some sort that can lead to more duplication.

    are you seeing any such crashes/timeouts in your logs?

    there are a few kafka features you could try using to reduce the likelihood of this happening to you:

    1. set enable.idempotence to true in your producer configs (see https://kafka.apache.org/documentation/#producerconfigs) - incurs some overhead
    2. use transactions when producing - incurs overhead and adds latency
    3. set transactional.id on the producer in case your fail over across machines - gets complicated to manage at scale
    4. set isolation.level to read_committed on the consumer - adds latency (needs to be done in combination with 2 above)
    5. shorten auto.commit.interval.ms on the consumer - just reduces the window of duplication, doesnt really solve anything. incurs overhead at really low values.
    0 讨论(0)
提交回复
热议问题