flink-streaming

flink cluster startup error [ERROR] Could not get JVM parameters properly

。_饼干妹妹 提交于 2020-03-25 16:01:26
问题 $ bin/start-cluster.sh Starting cluster. [INFO] 1 instance(s) of standalonesession are already running on centos1. Starting standalonesession daemon on host centos1. [ERROR] Could not get JVM parameters properly. [ERROR] Could not get JVM parameters properly. I have got the $JAVA_HOME in all the master and slaves ]$ echo $JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64/ Below are the config file settings. jobmanager.rpc.address: 10.0.2.4 # The RPC port where the

How does Flink decide when to take a checkpoint?

扶醉桌前 提交于 2020-03-23 08:22:13
问题 I'd like to understand what determines when checkpoints are taken. How does this relate to the checkpointing interval? 回答1: To a first approximation, the Checkpoint Coordinator (part of the Job Manager) uses the checkpoint interval to determine when to start a new checkpoint. This interval is passed when you enable checkpointing, e.g., here is it set wait for 10 seconds between checkpoints: env.enableCheckpointing(10000L); or it can also be set via execution.checkpointing.interval . However,

Kinesis Streams and Flink

浪尽此生 提交于 2020-03-05 00:25:26
问题 I have a question regarding sharding data in a Kinesis stream. I would like to use a random partition key when sending user data to my kinesis stream so that the data in the shards is evenly distributed. For the sake of making this question simpler, I would then like to aggregate the user data by keying off of a userId in my Flink application. My question is this: if the shards are randomly partitioned so that data for one userId is spread across multiple Kinesis shards, can Flink handle

Apache Flink: How to apply multiple counting window functions?

假装没事ソ 提交于 2020-02-26 10:05:27
问题 I have a stream of data that is keyed and need to compute counts for tumbled of different time periods (1 minute, 5 minutes, 1 day, 1 week). Is it possible to compute all four window counts in a single application? 回答1: Yes, that's possible. If you are using event-time, you can simply cascade the windows with increasing time intervals. So you do: DataStream<String> data = ... // append a Long 1 to each record to count it. DataStream<Tuple2<String, Long>> withOnes = data.map(new AppendOne);

Flink job distribution over cluster nodes

Deadly 提交于 2020-02-24 11:11:28
问题 We have 4 jobs that are running over 3 nodes with 4 slots per each, On Flink 1.3.2 the jobs were evenly distributed per node. After upgrading to flink 1.5 , each job is running on a single node (with a carry over to another if there are no slots left) Is there a way to return to an even distribution? The jobs are not evenly by load which cause some nodes to work harder than other. 回答1: An answer I received from flink mailing list Re: Flink 1.5 job distribution over cluster nodes Hi Shachar,

Flink job distribution over cluster nodes

梦想与她 提交于 2020-02-24 11:11:10
问题 We have 4 jobs that are running over 3 nodes with 4 slots per each, On Flink 1.3.2 the jobs were evenly distributed per node. After upgrading to flink 1.5 , each job is running on a single node (with a carry over to another if there are no slots left) Is there a way to return to an even distribution? The jobs are not evenly by load which cause some nodes to work harder than other. 回答1: An answer I received from flink mailing list Re: Flink 1.5 job distribution over cluster nodes Hi Shachar,

Processing time windows doesn't work on finite data sources in Apache Flink

两盒软妹~` 提交于 2020-02-04 05:50:07
问题 I'm trying to apply a very simple window function to a finite data stream in Apache Flink ( locally, no cluster ). Here's the example: val env = StreamExecutionEnvironment.getExecutionEnvironment env .fromCollection(List("a", "b", "c", "d", "e")) .windowAll(TumblingProcessingTimeWindows.of(Time.seconds(1))) .trigger(ProcessingTimeTrigger.create) .process(new ProcessAllWindowFunction[String, String, TimeWindow] { override def process(context: Context, elements: Iterable[String], out: Collector

Flink Kinesis Consumer not storing last successfully processed sequence nos

*爱你&永不变心* 提交于 2020-02-03 16:45:11
问题 We are using Flink Kinesis Consumer to consume data from Kinesis stream into our Flink application. KCL library uses a DynamoDB table to store last successfully processed Kinesis stream sequence nos. so that the next time application starts, it resumes from where it left off. But, it seems that Flink Kinesis Consumer does not maintain any such sequence nos. in any persistent store. As a result, we need to rely upon ShardIteratortype (trim_horizen, latest, etc) to decide where to resume Flink

Flink - how to solve error This job is not stoppable

99封情书 提交于 2020-01-25 11:11:18
问题 I tried to stop a job through flink stop flink stop [jobid] However the CLI throws error and does not allow me to stop the job. I could cancel it. What could be the reason here? Stopping job c7196bb1d21d679efed73770a4e4f9ed. ------------------------------------------------------------ The program finished with the following exception: org.apache.flink.util.FlinkException: Could not stop the job c7196bb1d21d679efed73770a4e4f9ed. at org.apache.flink.client.cli.CliFrontend.lambda$stop$5

Flink - how to solve error This job is not stoppable

≯℡__Kan透↙ 提交于 2020-01-25 11:11:13
问题 I tried to stop a job through flink stop flink stop [jobid] However the CLI throws error and does not allow me to stop the job. I could cancel it. What could be the reason here? Stopping job c7196bb1d21d679efed73770a4e4f9ed. ------------------------------------------------------------ The program finished with the following exception: org.apache.flink.util.FlinkException: Could not stop the job c7196bb1d21d679efed73770a4e4f9ed. at org.apache.flink.client.cli.CliFrontend.lambda$stop$5