问题
AFAIK, max.poll.interval.ms is introduced in Kafka 0.10.1. However it is still unclear that when we can use both session.timeout.ms and max.poll.interval.ms Consider the use casein which heartbeat thread is not responding, but my processing thread as it has higher value set, it still is processing the record. But as heartbeat thread is down then after crossing session.timeout.ms, what exactly happens. Because I've observed in POC that consumer re-balance doesn't happen until it reaches max.poll.interval.ms. So for me session.timeout.ms seems redundant. Similar question is posted but it doesn't answer this question.
回答1:
session.timeout.ms
is used to detect consumer failures via heartbeat mechanism. The consumer heartbeat thread must send a heartbeat to the broker before session.timeout.ms
time expires. Otherwise consumer considered as dead by Kafka and rebalance is triggered.
heartbeat.interval.ms: The expected time between heartbeats to the consumer coordinator when using Kafka's group management facilities. Heartbeats are used to ensure that the consumer's session stays active and to facilitate rebalancing when new consumers join or leave the group.
session.timeout.ms: The timeout used to detect client failures when using Kafka's group management facility. The client sends periodic heartbeats to indicate its liveness to the broker. If no heartbeats are received by the broker before the expiration of this session timeout, then the broker will remove this client from the group and initiate a rebalance.
Polling is another mechanism to check consumers health. A consumer is expected to call poll() method without expiring max.poll.interval.ms
. If this time expires (normally long running process leads this problem) again consumer considered as dead and rebalance is triggered.
max.poll.interval.ms: The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time that the consumer can be idle before fetching more records. If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member.
Other important point is that (from version 0.10.1.0):
rebalance.timeout = max.poll.interval.ms
Since we give the client as much as max.poll.interval.ms to handle a batch of records, this is also the maximum time before a consumer can be expected to rejoin the group in the worst case. We therefore propose to set the rebalance timeout in the Java client to the same value configured with max.poll.interval.ms. When a rebalance begins, the background thread will continue sending heartbeats. The consumer will not rejoin the group until processing completes and the user calls poll(). From the coordinator's perspective, the consumer will not be removed from the group until either 1) their session timeout expires without receiving a heartbeat, or 2) the rebalance timeout expires.
So in your case, if session.timeout.ms
expires without heartbeat for a consumer then rebalance is started in this consumer group. After rebalance starts all the consumer in the consumer group is revoked and Kafka waits all the consumers which is still sending heartbeat to poll() (by polling consumers send joinGroupRequest at that point) until rebalance timeout expires which is equal to max.poll.interval.ms
.
During rebalance you can still process message that you already have but cannot commit and get CommitFailedException with this message:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
For more information you can check this.
来源:https://stackoverflow.com/questions/60051936/difference-between-session-timeout-ms-and-max-poll-interval-ms-for-kafka