flink-streaming

“No Metrics ” in Flink webUI

被刻印的时光 ゝ 提交于 2019-12-11 14:26:07
问题 I started a local flink server (./bin/start-cluster.sh), and submit a job. I have the following code to define an metrics. .map(new RichMapFunction<String, String>() { private transient Counter counter; @Override public void open(Configuration config) { this.counter = getRuntimeContext() .getMetricGroup() .counter("myCounter"); } @Override public String map(String value) throws Exception { this.counter.inc(); return value; } }) but when I run the job and send some data, I cannot see any

Apache Flink: Performance issue when running many jobs

不问归期 提交于 2019-12-11 13:45:04
问题 With a high number of Flink SQL queries (100 of below), the Flink command line client fails with a "JobManager did not respond within 600000 ms" on a Yarn cluster, i.e. the job is never started on the cluster. JobManager logs has nothing after the last TaskManager started except DEBUG logs with "job with ID 5cd95f89ed7a66ec44f2d19eca0592f7 not found in JobManager", indicating its likely stuck (creating the ExecutionGraph?). The same works as standalone java program locally (high CPU initially

Flink: What is the best way to summarize the result from all partitions

青春壹個敷衍的年華 提交于 2019-12-11 13:16:45
问题 The datastream is partitioned and distributed to each slot for processing. Now I can get the result of each partitioned task. What is the best approach to apply some function to those result of different partitions and get a global summary result? Updated: I want to implement some data summary algorithm such as Misra-Gries in Flink. It will maintain k counters and update with data arriving. Because data may be large scalable, It's better that each partition has its own k counters and process

Flink source for periodical update

百般思念 提交于 2019-12-11 10:26:18
问题 I'm trying to implement external config for long-running flink job. My idea is to create custom source that periodically (every 5 minutes) polls JSON-encoded config from external service by http. How to create source that perform action every N minutes? How can I rebroadcast this config to all executors? 回答1: first, you need to make an event class which will define all the attributes that your event stream has and then makes all getters, setters and other methods. An example of this class

Change source function in Flink without interrupting the execution

余生长醉 提交于 2019-12-11 08:42:32
问题 I am looking for a solution how I can change a source function in Flink while execution is in progress: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); SourceFunction<String> mySource = ...; // this a function that I want to change during runtime; DataStream<String> stream = env.addSource(mySource); stream.map(...).print(); // creating my stream env.execute("sample"); I am thinking about creating a wrapper around a real implementation of SourceFunction

Flink keyed stream key is null

自闭症网瘾萝莉.ら 提交于 2019-12-11 07:39:04
问题 I am trying to perform a map operation on a KeyedStream in Flink: stream.map(new JsonToMessageObjectMapper()) .keyBy("keyfield") .map(new MessageProcessorStateful()) The output of the JsonToObjectMapper operator is a POJO of class MessageObject which has a String field ' keyfield '. The stream is then keyed on this field. The MessageProcessorStateful is a RichMapFunction like this: public class MessageAdProcessorStateful extends RichMapFunction<MessageObject, Tuple2<String, String>> { private

Find count in WindowedStream - Flink

柔情痞子 提交于 2019-12-11 05:00:05
问题 I am pretty new in the world of Streams and I am facing some issues in my first try. More specifically, I am trying to implement a count and groupBy functionality in a sliding window using Flink. I 've done it in a normal DateStream but I am not able to make it work in a WindowedStream . Do you have any suggestion on how can I do it? val parsedStream: DataStream[(String, Response)] = stream .mapWith(_.decodeOption[Response]) .filter(_.isDefined) .map { record => ( s"${record.get.group.group

Flink Job suddenly crashed with error: Encountered error while consuming partitions

落花浮王杯 提交于 2019-12-10 22:50:08
问题 I have a streaming job failed after running for 1day and 10 hours. One of the subtasks suddenly failed and crashed the whole job. Since I set up a restart_strategy, the job automatically restarted but crashed again with the same error. I found the Task Manager's log that the failed task was on, but it is not very helpful for me to debug this. Can anyone suggest a better way? Thank you. Job manager log around the failure: 2019-05-09 19:50:59,230 INFO org.apache.flink.runtime.checkpoint

Flink dynamic scaling

℡╲_俬逩灬. 提交于 2019-12-10 15:51:31
问题 I am currently studying scalability on Flink. Starting from Version 1.2.0, dynamic rescaling was introduced. I am looking at scaling a long running job which reads data from Kafka source. Questions regarding dynamic rescaling. To scale out my flink application, for example: add new task managers, must I restart the job / yarn session to use the newly added resource? I think it's possible to write Yarn client to deploy new task managers and make it talk to job manager, is that already

Apache Flink: NullPointerException caused by TupleSerializer

本秂侑毒 提交于 2019-12-10 13:31:43
问题 When I execute my Flink application it gives me this NullPointerException : 2017-08-08 13:21:57,690 INFO com.datastax.driver.core.Cluster - New Cassandra host /127.0.0.1:9042 added 2017-08-08 13:22:02,427 INFO org.apache.flink.runtime.taskmanager.Task - TriggerWindow(TumblingEventTimeWindows(30000), ListStateDescriptor{serializer=org.apache.flink.api.common.typeutils.base.ListSerializer@15d1c80b}, EventTimeTrigger(), WindowedStream.apply(CoGroupedStreams.java:302)) -> Filter -> Flat Map ->