What happen to state in Flink Task Manager when crash?

问题

may i know what happen to state stored in Flink Task Manager when this Task manager crash. Say the state storage is rocksdb, would those data transfer to other running Task Manager so that complete state data is ready for data processing?

回答1:

Flink does not (yet) support dynamic rescaling of state, so the failed task manager must be recovered, and the job will be restarted from a checkpoint.

Exactly what that involves depends on how your cluster is configured, and whether the job failed because of an exception or because the machine/container running the task manager failed.

If you are using RocksDB and local recovery is enabled, then if the job died because of an exception, the task managers will all be able to restart the job more-or-less immediately from their local copy of the state. On the other hand, if a new task manager has to be spun up, then once it is running it will fetch what it needs from the latest checkpoint (from whatever distributed file system is used) and then the job will resume.

Without local recovery, every task manager will have to fetch the relevant portions of the checkpoint from the DFS.

In some cases it is possible to do something less expensive than a full recovery. See fine-grained recovery for details.

来源：https://stackoverflow.com/questions/54149134/what-happen-to-state-in-flink-task-manager-when-crash

标签

apache-flink

flink-streaming

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!