I got a statement below:
"Depending on your state backend, Flink can also manage the state for the application, meaning Flink deals with the memory management (possibly spilling to disk if necessary) to allow applications to hold very large state."
https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/state/state_backends.html
Does it mean that only when the state backends is configured to RocksDBStateBackend
, the state would keep in memory and possibly spilling to disk if necessary?
However if configured to MemoryStateBackend
or FsStateBackend
, the state only keep in memory and would never be spilled to disk.
Yes in general you are right. Only with RocksDBStateBackend
there will be spilling data to disk.
In case of both MemoryStateBackend
and FsStateBackend
the state is always kept in TaskManagers memory and thus must fit in there. The difference between those two backends is the way they checkpoint data.
In case of
MemoryStateBackend
the checkpoint data is sent to JobManager and kept also in memory there.The
FsStateBackend
stores data upon checkpoint in FileSystem and sends only small metadata to JobManager (or in HA scenario stores in metadata folder)
Therefore for any production use-cases the RocksDBStateBackend
is highly encouraged. More in-depth information you can find here.
来源:https://stackoverflow.com/questions/50548776/apache-flink-where-do-state-backends-keep-the-state