问题
may i know what happen to state stored in Flink Task Manager when this Task manager crash. Say the state storage is rocksdb, would those data transfer to other running Task Manager so that complete state data is ready for data processing?
回答1:
Flink does not (yet) support dynamic rescaling of state, so the failed task manager must be recovered, and the job will be restarted from a checkpoint.
Exactly what that involves depends on how your cluster is configured, and whether the job failed because of an exception or because the machine/container running the task manager failed.
If you are using RocksDB and local recovery is enabled, then if the job died because of an exception, the task managers will all be able to restart the job more-or-less immediately from their local copy of the state. On the other hand, if a new task manager has to be spun up, then once it is running it will fetch what it needs from the latest checkpoint (from whatever distributed file system is used) and then the job will resume.
Without local recovery, every task manager will have to fetch the relevant portions of the checkpoint from the DFS.
In some cases it is possible to do something less expensive than a full recovery. See fine-grained recovery for details.
来源:https://stackoverflow.com/questions/54149134/what-happen-to-state-in-flink-task-manager-when-crash