flink-streaming | 易学教程

“No Metrics ” in Flink webUI

阅读更多关于 “No Metrics ” in Flink webUI

问题 I started a local flink server (./bin/start-cluster.sh), and submit a job. I have the following code to define an metrics. .map(new RichMapFunction<String, String>() { private transient Counter counter; @Override public void open(Configuration config) { this.counter = getRuntimeContext() .getMetricGroup() .counter("myCounter"); } @Override public String map(String value) throws Exception { this.counter.inc(); return value; } }) but when I run the job and send some data, I cannot see any

Apache Flink: Performance issue when running many jobs

阅读更多关于 Apache Flink: Performance issue when running many jobs

问题 With a high number of Flink SQL queries (100 of below), the Flink command line client fails with a "JobManager did not respond within 600000 ms" on a Yarn cluster, i.e. the job is never started on the cluster. JobManager logs has nothing after the last TaskManager started except DEBUG logs with "job with ID 5cd95f89ed7a66ec44f2d19eca0592f7 not found in JobManager", indicating its likely stuck (creating the ExecutionGraph?). The same works as standalone java program locally (high CPU initially

Flink: What is the best way to summarize the result from all partitions

阅读更多关于 Flink: What is the best way to summarize the result from all partitions

问题 The datastream is partitioned and distributed to each slot for processing. Now I can get the result of each partitioned task. What is the best approach to apply some function to those result of different partitions and get a global summary result? Updated: I want to implement some data summary algorithm such as Misra-Gries in Flink. It will maintain k counters and update with data arriving. Because data may be large scalable, It's better that each partition has its own k counters and process

Flink source for periodical update

阅读更多关于 Flink source for periodical update

问题 I'm trying to implement external config for long-running flink job. My idea is to create custom source that periodically (every 5 minutes) polls JSON-encoded config from external service by http. How to create source that perform action every N minutes? How can I rebroadcast this config to all executors? 回答1: first, you need to make an event class which will define all the attributes that your event stream has and then makes all getters, setters and other methods. An example of this class

Change source function in Flink without interrupting the execution

阅读更多关于 Change source function in Flink without interrupting the execution

问题 I am looking for a solution how I can change a source function in Flink while execution is in progress: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); SourceFunction<String> mySource = ...; // this a function that I want to change during runtime; DataStream<String> stream = env.addSource(mySource); stream.map(...).print(); // creating my stream env.execute("sample"); I am thinking about creating a wrapper around a real implementation of SourceFunction

Flink keyed stream key is null

阅读更多关于 Flink keyed stream key is null

问题 I am trying to perform a map operation on a KeyedStream in Flink: stream.map(new JsonToMessageObjectMapper()) .keyBy("keyfield") .map(new MessageProcessorStateful()) The output of the JsonToObjectMapper operator is a POJO of class MessageObject which has a String field ' keyfield '. The stream is then keyed on this field. The MessageProcessorStateful is a RichMapFunction like this: public class MessageAdProcessorStateful extends RichMapFunction<MessageObject, Tuple2<String, String>> { private

Find count in WindowedStream - Flink

阅读更多关于 Find count in WindowedStream - Flink

问题 I am pretty new in the world of Streams and I am facing some issues in my first try. More specifically, I am trying to implement a count and groupBy functionality in a sliding window using Flink. I 've done it in a normal DateStream but I am not able to make it work in a WindowedStream . Do you have any suggestion on how can I do it? val parsedStream: DataStream[(String, Response)] = stream .mapWith(_.decodeOption[Response]) .filter(_.isDefined) .map { record => ( s"${record.get.group.group

Flink Job suddenly crashed with error: Encountered error while consuming partitions

阅读更多关于 Flink Job suddenly crashed with error: Encountered error while consuming partitions

问题 I have a streaming job failed after running for 1day and 10 hours. One of the subtasks suddenly failed and crashed the whole job. Since I set up a restart_strategy, the job automatically restarted but crashed again with the same error. I found the Task Manager's log that the failed task was on, but it is not very helpful for me to debug this. Can anyone suggest a better way? Thank you. Job manager log around the failure: 2019-05-09 19:50:59,230 INFO org.apache.flink.runtime.checkpoint

Flink dynamic scaling

阅读更多关于 Flink dynamic scaling

问题 I am currently studying scalability on Flink. Starting from Version 1.2.0, dynamic rescaling was introduced. I am looking at scaling a long running job which reads data from Kafka source. Questions regarding dynamic rescaling. To scale out my flink application, for example: add new task managers, must I restart the job / yarn session to use the newly added resource? I think it's possible to write Yarn client to deploy new task managers and make it talk to job manager, is that already

Apache Flink: NullPointerException caused by TupleSerializer

阅读更多关于 Apache Flink: NullPointerException caused by TupleSerializer

问题 When I execute my Flink application it gives me this NullPointerException : 2017-08-08 13:21:57,690 INFO com.datastax.driver.core.Cluster - New Cassandra host /127.0.0.1:9042 added 2017-08-08 13:22:02,427 INFO org.apache.flink.runtime.taskmanager.Task - TriggerWindow(TumblingEventTimeWindows(30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@15d1c80b}, EventTimeTrigger(), WindowedStream.apply(CoGroupedStreams.java:302)) -> Filter -> Flat Map ->