Manually setting Kafka consumer offset

匆匆过客 提交于 2021-02-10 07:14:14

问题


In our project, there are Active Kafka servers( PR) and Passive Kafka servers (DR), both Kafka brokers are configured with the same group name, topic name and partition in our project. When switching from PR to DR the _consumer_offsets is manually set on DR.

My question here is, would the Kafka consumer be able to seamlessly consume the messages from where it was last read?


回答1:


When replicating messages across 2 clusters, it's not possible to ensure offsets stay in sync.

For example, if a topic exists for a little while on the Active cluster the log start offset for some partitions may not be 0 (some records have been deleted by the retention policies). Hence when replicating this topic, offsets between both clusters will not be the same. This can also happen when messages are lost or duplicated as you can't have exactly once semantics when replicating between 2 clusters.

So you can't just replicate the __consumer_offsets topic, this will not work. Consumer group positions have to be explicitly "translated" between both clusters. While it's possible to reset them "manually" by directly committing, it's not recommended as finding the new positions is not obvious.

Instead, you should use a replication tool that supports "offset translation" to ensure consumers can seamlessly switch from 1 cluster to the other.

For example, Mirror Maker 2, the official Kafka tool for mirroring clusters, supports offset translation via RemoteClusterUtils. You can find the details in the KIP.




回答2:


In itself, relying on the fact that both clusters will have the same offset is faulty.

Offset - is relative characteristic. It's not a part of a message. It's literally a position inside the file. And those files, Kafka log files, also rotate and have retentions. There's no guarantee that those log files are identical at any given point in time. Kafka doesn't claim to solve such an issue.
Besides, it's tricky to solve from CAP point of view.
And it's also pointless unless you want strict physical replication.

That's why Kafka multi-cluster tools are usually about logical replication. I have not used Mirror Maker(MM) but I've used Replicator(which is a more advanced commercial tool by Confluent) and it has a feature for that called, who would have guessed, just like the MM one - offset translation. Replicator does the following:

  • Reads the consumer offset and timestamp information from the __consumer_timestamps topic in the origin cluster to understand a consumer group’s progress.
  • Translates the committed offsets in the origin datacenter to the corresponding offsets in the destination datacenter.
  • Writes the translated offsets to the __consumer_offsets topic in the destination cluster, as long as no consumers in that group are connected to the destination cluster.

Note: You do need to add an interceptor to your Kafka Consumers.



来源:https://stackoverflow.com/questions/60935626/manually-setting-kafka-consumer-offset

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!