问题
I'm running an Kafka Streams app with version 2.1.0. I found after running for some time, my app (63 nodes) will enter ERROR state one by one. Eventually, all 63 nodes are down. The exception is :
ERROR o.a.k.s.p.i.ProcessorStateManager - task [2_2] Failed to
flush state store KSTREAM-REDUCE-STATE-STORE-0000000014:
org.apache.kafka.streams.errors.StreamsException: task [2_2]
Abort sending since an error caught with a previous record
(key 110646599468 value InterimMessage [sessionStart=1567150872690,count=1]
timestamp 1567154490411) to topic item.interim due to
org.apache.kafka.common.errors.TimeoutException: Failed to update
metadata after 60000 ms.
You can increase producer parameter `retries` and `retry.backoff.ms`
to avoid this error.
I enabled the DEBUG logging and found out that the exception happens when the KStream ask for metadata update for internal topics only but not the destination topic. (item.interim
is the destination topic)
Normally,
[Producer clientId=client-autocreate-StreamThread-1-producer] Sending metadata
request (type=MetadataRequest,
topics=item.interim,test-KSTREAM-REDUCE-STATE-STORE-0000000014-changelog)
to node XXX:9092 (id: 7 rack: XXX)
But before the exception, it was
[Producer clientId=client-autocreate-StreamThread-1-producer] Sending metadata
request (type=MetadataRequest,
topics=test-KSTREAM-REDUCE-STATE-STORE-0000000014-changelog)
to node XXX:9092 (id: 7 rack: XXX)
Config I have changed:
max.request.size=14000000
receive.buffer.bytes=32768
auto.offset.reset=latest
enable.auto.commit=false
default.api.timeout.ms=180000
cache.max.bytes.buffering=10485760
retries=20
retry.backoff.ms=80000
request.timeout.ms=120000
commit.interval.ms=100
num.stream.threads=1
session.timeout.ms=30000
I'm really confused. Could anyone help me to understand, why producer will send different Metadata request? And any possible way to solve the problem? Thanks a lot!
来源:https://stackoverflow.com/questions/57732339/kafka-streams-meatadata-request-only-contains-internal-topics