flink-streaming

apache flink 0.10 how to get the first occurence of a composite key from an unbounded input dataStream?

爷,独闯天下 提交于 2019-11-30 09:43:05
i am a newbie with apache flink. i have an unbound data stream in my input (fed into flink 0.10 via kakfa). i want to get the 1st occurence of each primary key (the primary key is the contract_num and the event_dt). These "duplicates" occur nearly immediately after each other. The source system cannot filter this for me, so flink has to do it. Here is my input data: contract_num, event_dt, attr A1, 2016-02-24 10:25:08, X A1, 2016-02-24 10:25:08, Y A1, 2016-02-24 10:25:09, Z A2, 2016-02-24 10:25:10, C Here is the output data i want: A1, 2016-02-24 10:25:08, X A1, 2016-02-24 10:25:09, Z A2, 2016

Flink: How to handle external app configuration changes in flink

梦想与她 提交于 2019-11-29 20:57:35
问题 My requirement is to stream millions of records in a day and it has huge dependency on external configuration parameters. For example, a user can go and change the required setting anytime in the web application and after the change is made, the streaming has to happen with the new application config parameters. These are app level configurations and we also have some dynamic exclude parameters which each data has to be passed through and filtered. I see that flink doesn’t have global state

flink keyBy adding delay; how can I reduce this latency?

醉酒当歌 提交于 2019-11-28 12:46:00
问题 When I ran a simple flink application with KeyedStream, I observed the time latency of an event varies from 0 to 100ms. Below is the program StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<Long> source = env.addSource(new SourceFunction<Long>() { public void run(SourceContext<Long> sourceContext) throws Exception { while(true) { synchronized (sourceContext.getCheckpointLock()) { sourceContext.collect(System.currentTimeMillis()); Thread.sleep

How to sort a stream by event time using Flink SQL

不想你离开。 提交于 2019-11-28 11:26:33
问题 I have an out-of-order DataStream<Event> that I want to sort so that the events are ordered by their event time timestamps. I've simplified my use case down to where my Event class has just a single field -- the timestamp field: public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env); env.setStreamTimeCharacteristic

What is/are the main difference(s) between Flink and Storm?

久未见 提交于 2019-11-28 02:38:31
Flink has been compared to Spark , which, as I see it, is the wrong comparison because it compares a windowed event processing system against micro-batching; Similarly, it does not make that much sense to me to compare Flink to Samza. In both cases it compares a real-time vs. a batched event processing strategy, even if at a smaller "scale" in the case of Samza. But I would like to know how Flink compares to Storm, which seems conceptually much more similar to it. I have found this (Slide #4) documenting the main difference as "adjustable latency" for Flink. Another hint seems to be an article

What is/are the main difference(s) between Flink and Storm?

我只是一个虾纸丫 提交于 2019-11-27 04:56:56
问题 Flink has been compared to Spark, which, as I see it, is the wrong comparison because it compares a windowed event processing system against micro-batching; Similarly, it does not make that much sense to me to compare Flink to Samza. In both cases it compares a real-time vs. a batched event processing strategy, even if at a smaller "scale" in the case of Samza. But I would like to know how Flink compares to Storm, which seems conceptually much more similar to it. I have found this (Slide #4)