Kafka Streams stateStores fault tolerance exactly once?

南笙酒味 提交于 2019-12-13 01:43:50

问题


We're trying to achieve a deduplication service using Kafka Streams. The big picture is that it will use its rocksDB state store in order to check existing keys during process.

Please correct me if I'm wrong, but to make those stateStores fault tolerant too, Kafka streams API will transparently copy the values in the stateStore inside a Kafka topic ( called the change Log). That way, if our service falls, another service will be able to rebuild its stateStore according to the changeLog found in Kafka.

But it raises a question to my mind, do this " StateStore --> changelog" itself is exactly once ? I mean, When the service will update its stateStore, it will update the changelog in an exactly once fashion too.. ? If the service crash, another one will take the load, but can we sure it won't miss a stateStore update from the crashing service ?

Regards,

Yannick


回答1:


Short answer is yes.

Using transaction - Atomic multi-partition write - Kafka Streams insure, that when offset commit was performed, state store was also flashed to changelog topic on the brokers. Above operations are Atomic, so if one of them will failed, application will reprocess messages from previous offset position.

You can read in following blog more about exactly once semantic https://www.confluent.io/blog/enabling-exactly-kafka-streams/. There is section: How Kafka Streams Guarantees Exactly-Once Processing.




回答2:


But it raises a question to my mind, do this " StateStore --> changelog" itself is exactly once ?

Yes -- as others have already said here. You must of course configure your application to use exactly-once semantics via the configuration parameter processing.guarantee, see https://kafka.apache.org/21/documentation/streams/developer-guide/config-streams.html#processing-guarantee (this link is for Apache Kafka 2.1).

We're trying to achieve a deduplication service using Kafka Streams. The big picture is that it will use its rocksDB state store in order to check existing keys during process.

There's also an event de-duplication example application available at https://github.com/confluentinc/kafka-streams-examples/blob/5.1.0-post/src/test/java/io/confluent/examples/streams/EventDeduplicationLambdaIntegrationTest.java. This links points to the repo branch for Confluent Platform 5.1.0, which uses Apache Kafka 2.1.0 = the latest version of Kafka available right now.



来源:https://stackoverflow.com/questions/54592072/kafka-streams-statestores-fault-tolerance-exactly-once

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!