flink-streaming

Using Apache Flink for data streaming

杀马特。学长 韩版系。学妹 提交于 2019-12-23 18:24:21
问题 I am working on building an application with below requirements and I am just getting started with flink. Ingest data into Kafka with say 50 partitions (Incoming rate - 100,000 msgs/sec) Read data from Kafka and process each data (Do some computation, compare with old data etc) real time Store the output on Cassandra I was looking for a real time streaming platform and found Flink to be a great fit for both real time and batch. Do you think flink is the best fit for my use case or should I

Apache Flink: Skewed data distribution on KeyedStream

微笑、不失礼 提交于 2019-12-23 16:28:11
问题 I have this Java code in Flink: env.setParallelism(6); //Read from Kafka topic with 12 partitions DataStream<String> line = env.addSource(myConsumer); //Filter half of the records DataStream<Tuple2<String, Integer>> line_Num_Odd = line_Num.filter(new FilterOdd()); DataStream<Tuple3<String, String, Integer>> line_Num_Odd_2 = line_Num_Odd.map(new OddAdder()); //Filter the other half DataStream<Tuple2<String, Integer>> line_Num_Even = line_Num.filter(new FilterEven()); DataStream<Tuple3<String,

Flink webui when running from IDE

自闭症网瘾萝莉.ら 提交于 2019-12-23 07:47:45
问题 I am trying to see my job in the web ui. I use createLocalEnvironmentWithWebUI, code is running well in IDE, but impossible to see my job in http://localhost:8081/#/overview val conf: Configuration = new Configuration() import org.apache.flink.configuration.ConfigConstants conf.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true) val env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf) env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) val rides = env.addSource

TaskManager was lost/killed

耗尽温柔 提交于 2019-12-22 10:58:43
问题 When I am trying to run the flink job in standalone cluster I get this error: java.lang.Exception: TaskManager was lost/killed: ResourceID{resourceId='2961948b9ac490c11c6e41b0ec197e9f'} @ localhost (dataPort=55795) at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217) at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533) at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192)

Flink: Sharing state in CoFlatMapFunction

狂风中的少年 提交于 2019-12-22 03:44:35
问题 Got stuck a bit with CoFlatMapFunction . It seems to work fine if I place it on the DataStream before window but fails if placed after window's “apply” function. I was testing two streams, main “Features” on flatMap1 constantly ingesting data and control stream “Model” on flatMap2 changing the model on request. I am able to set and see b0/b1 properly set in flatMap2 , but flatMap1 always see b0 and b1 as was set to 0 at the initialization. Am I missing something obvious here? public static

Apache flink - job simple windowing problem - java.lang.RuntimeException: segment has been freed - Mini Cluster problem

我是研究僧i 提交于 2019-12-20 07:14:47
问题 Apache flink - job simple windowing problem - java.lang.RuntimeException: segment has been freed Hi, I am a flink newbee and in my job, I am trying to use windowing to simply aggregate elements to enable delayed processing: src = src.timeWindowAll(Time.milliseconds(1000)).process(new BaseDelayingProcessAllWindowFunctionImpl()); processwindow function simply collects input elements: public class BaseDelayingProcessAllWindowFunction<IN> extends ProcessAllWindowFunction<IN, IN, TimeWindow> {

Ordering of Records in Stream

梦想的初衷 提交于 2019-12-19 11:50:31
问题 Here are some of the queries I have : I have two different streams stream1 and stream2 in which the elements are in order. 1) Now when I do keyBy on each of these streams, will the order be maintained? (Since every group here will be sent to one task manager only ) My understanding is that the records will be in order for a group, correct me here. 2) After the keyBy on both of the streams I am doing co-group to get the matching and non-matching records. Will the order be maintained here also?

Apache Flink: Count window with timeout

雨燕双飞 提交于 2019-12-19 08:26:13
问题 Here is a simple code example to illustrate my question: case class Record( key: String, value: Int ) object Job extends App { val env = StreamExecutionEnvironment.getExecutionEnvironment val data = env.fromElements( Record("01",1), Record("02",2), Record("03",3), Record("04",4), Record("05",5) ) val step1 = data.filter( record => record.value % 3 != 0 ) // introduces some data loss val step2 = data.map( r => Record( r.key, r.value * 2 ) ) val step3 = data.map( r => Record( r.key, r.value * 3

apache flink 0.10 how to get the first occurence of a composite key from an unbounded input dataStream?

折月煮酒 提交于 2019-12-18 13:27:24
问题 i am a newbie with apache flink. i have an unbound data stream in my input (fed into flink 0.10 via kakfa). i want to get the 1st occurence of each primary key (the primary key is the contract_num and the event_dt). These "duplicates" occur nearly immediately after each other. The source system cannot filter this for me, so flink has to do it. Here is my input data: contract_num, event_dt, attr A1, 2016-02-24 10:25:08, X A1, 2016-02-24 10:25:08, Y A1, 2016-02-24 10:25:09, Z A2, 2016-02-24 10

Flink error on using RichAggregateFunction

筅森魡賤 提交于 2019-12-14 03:48:06
问题 I am trying to use an implementation of the abstract RichAggregateFunction in Flink. I want it to be "rich" because I need to store some state as part of the aggregator, and I can do this since I have access to the runtime context. My code is something like below: stream.keyBy(...) .window(GlobalWindows.create()) .trigger(...) .aggregate(new MyRichAggregateFunction()); However, I get an UnsupportedOperationException saying This aggregation function cannot be a RichFunction. I'm clearly not