flink-streaming

Having an equivalent to HOP_START inside an aggregation primitive in Flink

让人想犯罪 __ 提交于 2019-12-14 03:09:00
问题 I'm trying to do an exponentially decaying moving average over a hopping window in Flink SQL. I need the have access to one of the borders of the window, the HOP_START in the following: SELECT lb_index one_key, -- I have access to this one: HOP_START(proctime, INTERVAL '0.05' SECOND, INTERVAL '5' SECOND) start_time, -- Aggregation primitive: SUM( Y * EXP(TIMESTAMPDIFF( SECOND, proctime, -- This one throws: HOP_START(proctime, INTERVAL '0.05' SECOND, INTERVAL '5' SECOND) ))) FROM write

Flink Complex Event Processing

走远了吗. 提交于 2019-12-14 02:34:44
问题 I have a flink cep code that reads from socket and detects for a pattern. Lets say the pattern(word) is 'alert'. If the word alert occurs five times or more, an alert should be created. But I am getting an input mismatch error. Flink version is 1.3.0. Thanks in advance !! package pattern; import org.apache.flink.cep.CEP; import org.apache.flink.cep.PatternStream; import org.apache.flink.cep.pattern.Pattern; import org.apache.flink.cep.pattern.conditions.IterativeCondition; import org.apache

Does Flink SQL support Java Map types?

懵懂的女人 提交于 2019-12-14 02:32:39
问题 I'm trying to access a key from a map using Flink's SQL API. It fails with the error Exception in thread "main" org.apache.flink.table.api.TableException: Type is not supported: ANY Please advise how i can fix it. Here is my event class public class EventHolder { private Map<String,String> event; public Map<String, String> getEvent() { return event; } public void setEvent(Map<String, String> event) { this.event = event; } } Here is the main class which submits the flink job public class

Flink Trigger when State expires

核能气质少年 提交于 2019-12-13 20:31:24
问题 I have an interesting use case which I want to test with Flink. I have an incoming stream of Message which is either PASS or FAIL . Now if the message is of type FAIL , I have a downstream ProcessFunction which saves the Message state and then sends pause commands to everything that depends on this. When I receive a PASS message which is associated with the FAIL I had received earlier (keying by message id), I send resume commands to everything I had paused earlier. Now I plan on using State

Apache Flink 1.4.2 akka.actor.ActorNotFound

℡╲_俬逩灬. 提交于 2019-12-13 20:25:22
问题 After upgrading to Apache Flink 1.4.2 we get following errors every few seconds on one TaskManager out of 3. 2018-06-27 17:33:46.632 [jobmanager-future-thread-2] DEBUG o.a.flink.runtime.rest.handler.legacy.metrics.MetricFetcher - Could not retrieve QueryServiceGateway. java.util.concurrent.CompletionException: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://flink@tm03-dev:6124/), Path(/user/MetricQueryService_64bde0e9e6f3f0a906a30e88c261c9d7)] at java.util

Deploy stream processing topology on runtime?

亡梦爱人 提交于 2019-12-13 17:33:09
问题 H all, I have a requirement where in I need to re-ingest some of my older data. We have a multi staged pipeline , the source of which is a Kafka topic. Once a record is fed into that, it runs through a series of steps(about 10). Each step massages the original JSON object pushed to the source topic and pushes to a destination topic. Now, sometimes, we need to re ingest the older data and apply a subset of the steps I described above. We intend to push these re-ingest records to a different

Why doesn't the Flink SocketTextStreamWordCount work?

杀马特。学长 韩版系。学妹 提交于 2019-12-13 15:48:12
问题 I've set up the example project and built it. I'm able to run the WordCount program as expected. But when I run the SocketTextWordCount, I'm not getting any results printed out. I send data in through nc (localhost:9999 on both sides) In the web console for the running job, I can see that there are messages being sent/received But I never see the counts.print() output printed out anywhere, even after killing the nc session. EDIT - when I change it around to print results to a text file, no

Effect of increasing parallelism on throughput

百般思念 提交于 2019-12-13 03:53:57
问题 I ran a job first with Parallelism 1 and then with Parallelism 3. With Parallelism=1, the kafka source was reading records at rate ~500 records per second. With Parallelism=3, the throughput got divided among the three parallelisms, each reading approximately ~150 records per second. Note that the source is publishing records at a much higher rate (~1000 records per second). Is this expected? I would imagine the throughput to increase with parallelism, but it is remaining the same. I checked

How to let Flink flush last line to sink when producer(kafka) does not produce new line

≯℡__Kan透↙ 提交于 2019-12-13 03:49:51
问题 when my Flink program is in event time mode, sink will not get last line(say line A). If I feed new line(line B) to Flink, I will get the line A, but I still cann't get the line b. val env = StreamExecutionEnvironment.getExecutionEnvironment env.setParallelism(1) env.enableCheckpointing(5000, CheckpointingMode.EXACTLY_ONCE) env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) val properties = new Properties() properties.setProperty("bootstrap.servers", "localhost:9092") properties

Flink session window with onEventTime trigger?

只谈情不闲聊 提交于 2019-12-13 02:57:55
问题 I want to create an EventTime based session-window in Flink, such that it triggers when the event time of a new message is more than 180 seconds greater than the event time of the message, that created the window. For example: t1(0 seconds) : msg1 <-- This is the first message which causes the session-windows to be created t2(13 seconds) : msg2 t3(39 seconds) : msg3 . . . . t7(190 seconds) : msg7 <-- The event time (t7) is more than 180 seconds than t1 (t7 - t1 = 190), so the window should be