Flink checkpoints keeps failing
问题 we are trying to setup a Flink stateful job using RocksDB backend. We are using session window, with 30mins gap. We use aggregateFunction, so not using any Flink state variables. With sampling, we have less than 20k events/s, 20 - 30 new sessions/s. Our session basically gather all the events. the size of the session accumulator would go up along time. We are using 10G memory in total with Flink 1.9, 128 containers. Following's the settings: state.backend: rocksdb state.checkpoints.dir: hdfs: