apache-flink | 易学教程

can i use flink rocksDB state backend with local file system?

阅读更多关于 can i use flink rocksDB state backend with local file system?

问题 I am exploring using Flink rocksDb state backend, the documentation seems to imply i can use a regular file system such as: file:///data/flink/checkpoints , but the code javadoc only mentions hdfs or s3 option here. I am wondering if it's possible to use local file system with flink rocksdb backend, thanks! Flink docs: https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend Flink code: https://github.com/apache/flink/blob/master/flink-state

Dynamic Table name in Cassandra Pojo Sink Flink

阅读更多关于 Dynamic Table name in Cassandra Pojo Sink Flink

问题 I am a newbie to Apache Flink. I am using Pojo Sink to load the data into Cassandra. Right now, I am specifying table and keyspace names with the help of @Table annotation. Now, I want to pass table name and keyspace name dynamically on run time so that I can load data into tables specified by user. Is there any way to achieve this? 回答1: @Table is a CQL annotation that defines which table this class entity maps to. AFAIK, currently there is no way to make it dynamically mapped to any table at

Exception when trying to upgrade to flink 1.3.1

阅读更多关于 Exception when trying to upgrade to flink 1.3.1

问题 I tried to upgrade my flink version in my cluster to 1.3.1 (and 1.3.2 as well) and I got the following exception in my task managers: 2018-02-28 12:57:27,120 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask - Error during disposal of stream operator. org.apache.kafka.common.KafkaException: java.lang.InterruptedException at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:424) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.close

What is Apache Flink's detached mode?

阅读更多关于 What is Apache Flink's detached mode?

问题 I saw this line in Flink documentation but can't figure out what 'detached mode' means. Please help. Thanks. Run example program in detached mode: ./bin/flink run -d ./examples/batch/WordCount.jar 回答1: The Flink CLI runs jobs either in blocking or detached mode. In blocking mode, the CliFrontend (client) process keeps running, blocked, waiting for the job to complete -- after which it will print out some information. In the example below I ran a streaming job, which I cancelled from the WebUI

Elasticsearch Connector as Source in Flink

阅读更多关于 Elasticsearch Connector as Source in Flink

问题 I used Elasticsearch Connector as a Sink to insert data into Elasticsearch (see : https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/elasticsearch.html). But, I did not found any connector to get data from Elasticsearch as source. Is there any connector or example to use Elasticsearch documents as source in a Flink pipline? Regards, Ali 回答1: I finaly defined a simple read from ElasticSearch function public static class ElasticsearchFunction extends ProcessFunction

Elasticsearch Connector as Source in Flink

阅读更多关于 Elasticsearch Connector as Source in Flink

How to count the number of records processed by Apache Flink in a given time window

阅读更多关于 How to count the number of records processed by Apache Flink in a given time window

问题 After defining a time window in flink as follows: val lines = socket.timeWindowAll(Time.seconds(5)) How can I compute the number of records in that particular window of 5 seconds? 回答1: The most efficient way to perform a count aggregation is a ReduceFunction . However, reduce has the restriction that input and output type must be identical. So you would have to convert the input to an Int before applying the window: val socket: DataStream[(String)] = ??? val cnts: DataStream[Int] = socket

Flink state empty (reinitialized) after rerun

阅读更多关于 Flink state empty (reinitialized) after rerun

问题 I'm trying to connect two streams, first is persisting in MapValueState : RocksDB save data in checkpoint folder, but after new run, state is empty. I run it locally and in flink cluster with cancel submiting in cluster and simply rerun locally env.setStateBackend(new RocksDBStateBackend(..) env.enableCheckpointing(1000) ... val productDescriptionStream: KeyedStream[ProductDescription, String] = env.addSource(..) .keyBy(_.id) val productStockStream: KeyedStream[ProductStock, String] = env

Flink state empty (reinitialized) after rerun

阅读更多关于 Flink state empty (reinitialized) after rerun

Keyby data distribution in Apache Flink, Logical or Physical Operator?

阅读更多关于 Keyby data distribution in Apache Flink, Logical or Physical Operator?

问题 According to the Apache Flink documentation, KeyBy transformation logically partitions a stream into disjoint partitions. All records with the same key are assigned to the same partition. Is KeyBy 100% logical transformation? Doesn't it include physical data partitioning for distribution across the cluster nodes? If so, then how it can guarantee that all the records with the same key are assigned to the same partition? For instance, assuming that we are getting a distributed data stream from