flink-streaming

Flink state empty (reinitialized) after rerun

北城以北 提交于 2020-12-13 11:32:02
问题 I'm trying to connect two streams, first is persisting in MapValueState : RocksDB save data in checkpoint folder, but after new run, state is empty. I run it locally and in flink cluster with cancel submiting in cluster and simply rerun locally env.setStateBackend(new RocksDBStateBackend(..) env.enableCheckpointing(1000) ... val productDescriptionStream: KeyedStream[ProductDescription, String] = env.addSource(..) .keyBy(_.id) val productStockStream: KeyedStream[ProductStock, String] = env

Flink state empty (reinitialized) after rerun

别说谁变了你拦得住时间么 提交于 2020-12-13 11:28:48
问题 I'm trying to connect two streams, first is persisting in MapValueState : RocksDB save data in checkpoint folder, but after new run, state is empty. I run it locally and in flink cluster with cancel submiting in cluster and simply rerun locally env.setStateBackend(new RocksDBStateBackend(..) env.enableCheckpointing(1000) ... val productDescriptionStream: KeyedStream[ProductDescription, String] = env.addSource(..) .keyBy(_.id) val productStockStream: KeyedStream[ProductStock, String] = env

Keyby data distribution in Apache Flink, Logical or Physical Operator?

与世无争的帅哥 提交于 2020-12-13 04:41:13
问题 According to the Apache Flink documentation, KeyBy transformation logically partitions a stream into disjoint partitions. All records with the same key are assigned to the same partition. Is KeyBy 100% logical transformation? Doesn't it include physical data partitioning for distribution across the cluster nodes? If so, then how it can guarantee that all the records with the same key are assigned to the same partition? For instance, assuming that we are getting a distributed data stream from