apache-flink

can i use flink rocksDB state backend with local file system?

梦想与她 提交于 2021-01-27 22:13:41
问题 I am exploring using Flink rocksDb state backend, the documentation seems to imply i can use a regular file system such as: file:///data/flink/checkpoints , but the code javadoc only mentions hdfs or s3 option here. I am wondering if it's possible to use local file system with flink rocksdb backend, thanks! Flink docs: https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend Flink code: https://github.com/apache/flink/blob/master/flink-state

Dynamic Table name in Cassandra Pojo Sink Flink

混江龙づ霸主 提交于 2021-01-27 07:06:06
问题 I am a newbie to Apache Flink. I am using Pojo Sink to load the data into Cassandra. Right now, I am specifying table and keyspace names with the help of @Table annotation. Now, I want to pass table name and keyspace name dynamically on run time so that I can load data into tables specified by user. Is there any way to achieve this? 回答1: @Table is a CQL annotation that defines which table this class entity maps to. AFAIK, currently there is no way to make it dynamically mapped to any table at

Exception when trying to upgrade to flink 1.3.1

大兔子大兔子 提交于 2021-01-27 06:30:10
问题 I tried to upgrade my flink version in my cluster to 1.3.1 (and 1.3.2 as well) and I got the following exception in my task managers: 2018-02-28 12:57:27,120 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask - Error during disposal of stream operator. org.apache.kafka.common.KafkaException: java.lang.InterruptedException at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:424) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.close

What is Apache Flink's detached mode?

怎甘沉沦 提交于 2021-01-27 05:35:21
问题 I saw this line in Flink documentation but can't figure out what 'detached mode' means. Please help. Thanks. Run example program in detached mode: ./bin/flink run -d ./examples/batch/WordCount.jar 回答1: The Flink CLI runs jobs either in blocking or detached mode. In blocking mode, the CliFrontend (client) process keeps running, blocked, waiting for the job to complete -- after which it will print out some information. In the example below I ran a streaming job, which I cancelled from the WebUI

Elasticsearch Connector as Source in Flink

馋奶兔 提交于 2021-01-07 04:15:33
问题 I used Elasticsearch Connector as a Sink to insert data into Elasticsearch (see : https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/elasticsearch.html). But, I did not found any connector to get data from Elasticsearch as source. Is there any connector or example to use Elasticsearch documents as source in a Flink pipline? Regards, Ali 回答1: I finaly defined a simple read from ElasticSearch function public static class ElasticsearchFunction extends ProcessFunction

Elasticsearch Connector as Source in Flink

只谈情不闲聊 提交于 2021-01-07 04:15:09
问题 I used Elasticsearch Connector as a Sink to insert data into Elasticsearch (see : https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/elasticsearch.html). But, I did not found any connector to get data from Elasticsearch as source. Is there any connector or example to use Elasticsearch documents as source in a Flink pipline? Regards, Ali 回答1: I finaly defined a simple read from ElasticSearch function public static class ElasticsearchFunction extends ProcessFunction

How to count the number of records processed by Apache Flink in a given time window

僤鯓⒐⒋嵵緔 提交于 2021-01-01 04:29:58
问题 After defining a time window in flink as follows: val lines = socket.timeWindowAll(Time.seconds(5)) How can I compute the number of records in that particular window of 5 seconds? 回答1: The most efficient way to perform a count aggregation is a ReduceFunction . However, reduce has the restriction that input and output type must be identical. So you would have to convert the input to an Int before applying the window: val socket: DataStream[(String)] = ??? val cnts: DataStream[Int] = socket

Flink state empty (reinitialized) after rerun

北城以北 提交于 2020-12-13 11:32:02
问题 I'm trying to connect two streams, first is persisting in MapValueState : RocksDB save data in checkpoint folder, but after new run, state is empty. I run it locally and in flink cluster with cancel submiting in cluster and simply rerun locally env.setStateBackend(new RocksDBStateBackend(..) env.enableCheckpointing(1000) ... val productDescriptionStream: KeyedStream[ProductDescription, String] = env.addSource(..) .keyBy(_.id) val productStockStream: KeyedStream[ProductStock, String] = env

Flink state empty (reinitialized) after rerun

别说谁变了你拦得住时间么 提交于 2020-12-13 11:28:48
问题 I'm trying to connect two streams, first is persisting in MapValueState : RocksDB save data in checkpoint folder, but after new run, state is empty. I run it locally and in flink cluster with cancel submiting in cluster and simply rerun locally env.setStateBackend(new RocksDBStateBackend(..) env.enableCheckpointing(1000) ... val productDescriptionStream: KeyedStream[ProductDescription, String] = env.addSource(..) .keyBy(_.id) val productStockStream: KeyedStream[ProductStock, String] = env

Keyby data distribution in Apache Flink, Logical or Physical Operator?

与世无争的帅哥 提交于 2020-12-13 04:41:13
问题 According to the Apache Flink documentation, KeyBy transformation logically partitions a stream into disjoint partitions. All records with the same key are assigned to the same partition. Is KeyBy 100% logical transformation? Doesn't it include physical data partitioning for distribution across the cluster nodes? If so, then how it can guarantee that all the records with the same key are assigned to the same partition? For instance, assuming that we are getting a distributed data stream from