问题
When our kafka stream application attempts to recover state from the changelog topic our rocksdb state store directory continually grows (10GB+) until we run out of disk space and never actually recovers.
How I can reproduce.
- I start up our application with a brand new changelog topic.
- I push a few hundred thousand records through. I note my RocksDb state store is around 100mb.
- I gracefully shutdown the application and restart it.
- I see the restore consumers logging and stating they are rebuilding the statestore from the beginning. I then watch my RocksDb state store directory size increase until I run out of disk space (10s of GB).
How is a RocksDB state store that is in the 100s of MB generating a RocksDb state store some unknown number above 10 GBs when recovering from the change log topic? Is there some compression/compaction that happens during normal operation that doesn't happen during recovery? Is my changelog topic not setup properly (we have to create the topic ahead of time due to security requirements; cleanup.policy is set to compact)?
I will note that we have relatively a few number of keys related to the number of records we pass into our streams application. Most of them are updates to existing keys.
来源:https://stackoverflow.com/questions/56726224/kafka-streams-state-store-unrecoverable-from-change-log-topic