apache-flink

Flink window function getResult not fired

半世苍凉 提交于 2020-06-01 06:14:25
问题 I am trying to use event time in my Flink job, and using BoundedOutOfOrdernessTimestampExtractor to extract timestamp and generate watermark. But I have some input Kafka having sparse stream, it can have no data for a long time, which makes the getResult in AggregateFunction not called at all. I can see data going into add function. I have set getEnv().getConfig().setAutoWatermarkInterval(1000L); I tried eventsWithKey .keyBy(entry -> (String) entry.get(key)) .window(TumblingEventTimeWindows

[flink]Task manager initialization failed

孤人 提交于 2020-05-28 07:27:20
问题 I am new to flink. I am trying to run the flink example on my local PC(windows). However, after I run the start-cluster.bat, I login to the dashboard, it shows the task manager is 0. I checked the log and seems it fails to initialize: 2020-02-21 23:03:14,202 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - TaskManager initialization failed. org.apache.flink.configuration.IllegalConfigurationException: Failed to create TaskExecutorResourceSpec at org.apache.flink.runtime

[flink]Task manager initialization failed

荒凉一梦 提交于 2020-05-28 07:25:06
问题 I am new to flink. I am trying to run the flink example on my local PC(windows). However, after I run the start-cluster.bat, I login to the dashboard, it shows the task manager is 0. I checked the log and seems it fails to initialize: 2020-02-21 23:03:14,202 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - TaskManager initialization failed. org.apache.flink.configuration.IllegalConfigurationException: Failed to create TaskExecutorResourceSpec at org.apache.flink.runtime

Flink: DataSet.count() is bottleneck - How to count parallel?

送分小仙女□ 提交于 2020-05-27 12:07:12
问题 I am learning Map-Reduce using Flink and have a question about how to efficiently count elements in a DataSet. What I have so far is this: DataSet<MyClass> ds = ...; long num = ds.count(); When executing this, in my flink log it says 12/03/2016 19:47:27 DataSink (count())(1/1) switched to RUNNING So there is only one CPU used (i have four and other commands like reduce use all of them). I think count() internally collects the DataSet from all four CPUs and counts them sequentially instead of

Flink: DataSet.count() is bottleneck - How to count parallel?

主宰稳场 提交于 2020-05-27 12:06:55
问题 I am learning Map-Reduce using Flink and have a question about how to efficiently count elements in a DataSet. What I have so far is this: DataSet<MyClass> ds = ...; long num = ds.count(); When executing this, in my flink log it says 12/03/2016 19:47:27 DataSink (count())(1/1) switched to RUNNING So there is only one CPU used (i have four and other commands like reduce use all of them). I think count() internally collects the DataSet from all four CPUs and counts them sequentially instead of

Manage state with huge memory usage - querying from storage

主宰稳场 提交于 2020-05-26 09:18:20
问题 Apologies if this sounds dumb! We are working with flink to make async IO calls. A lot of the times, the IO calls are repeated (same set of parameters) and about 80% of the APIs that we call return the same response for the same parameters. So, we would like to avoid making the calls again. We thought we could use state to store previous responses and use them again. The issue is that though our responses can be used again, the number of such responses is huge and therefore requires a lot of

How do I run Beam Python pipelines using Flink deployed on Kubernetes?

家住魔仙堡 提交于 2020-05-26 06:44:26
问题 Does anybody know how to run Beam Python pipelines with Flink when Flink is running as pods in Kubernetes? I have successfully managed to run a Beam Python pipeline using the Portable runner and the job service pointing to a local Flink server running in Docker containers. I was able to achieve that mounting the Docker socket in my Flink containers, and running Flink as root process, so the class DockerEnvironmentFactory can create the Python harness container. Unfortunately, I can't use the

How do I run Beam Python pipelines using Flink deployed on Kubernetes?

允我心安 提交于 2020-05-26 06:43:07
问题 Does anybody know how to run Beam Python pipelines with Flink when Flink is running as pods in Kubernetes? I have successfully managed to run a Beam Python pipeline using the Portable runner and the job service pointing to a local Flink server running in Docker containers. I was able to achieve that mounting the Docker socket in my Flink containers, and running Flink as root process, so the class DockerEnvironmentFactory can create the Python harness container. Unfortunately, I can't use the

Dynamic flink window creation by reading the details from kafka

a 夏天 提交于 2020-05-24 03:33:20
问题 Let say Kafka message contain flink window size configuration. I want read the message from kafka and create global window in flink. Problem Statement: Can we handle above scenario by using BroadcastStream ? Or Any other approach which will support above case ? 回答1: Flink's window API does not support dynamically changing window sizes. What you'll need to do is to implement your own windowing using a process function. In this case a KeyedBroadcastProcessFunction, where the window

Dynamic flink window creation by reading the details from kafka

不打扰是莪最后的温柔 提交于 2020-05-24 03:32:06
问题 Let say Kafka message contain flink window size configuration. I want read the message from kafka and create global window in flink. Problem Statement: Can we handle above scenario by using BroadcastStream ? Or Any other approach which will support above case ? 回答1: Flink's window API does not support dynamically changing window sizes. What you'll need to do is to implement your own windowing using a process function. In this case a KeyedBroadcastProcessFunction, where the window