flink-streaming

How to increase Flink taskmanager.numberOfTaskSlots to run it without Flink server(in IDE or fat jar)

耗尽温柔 提交于 2019-12-10 09:28:37
问题 I have one questions about running Flink streaming job in IDE or as fat jar without deploying it to Flink server. The problem is I cannot run it in IDE when I have more than 1 taskslot in my job. public class StreamingJob { public static void main(String[] args) throws Exception { // set up the streaming execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties kafkaProperties = new Properties(); kafkaProperties.setProperty(

Flink: how to store state and use in another stream?

不羁岁月 提交于 2019-12-10 07:06:14
问题 I have a use-case for Flink where I need to read information from a file, store each line, and then use this state to filter another stream. I have all of this working right now with the connect operator and a RichCoFlatMapFunction , but it feels overly complicated. Also, I'm concerned that flatMap2 could begin executing before all of the state is loaded from the file: fileStream .connect(partRecordStream.keyBy((KeySelector<PartRecord, String>) partRecord -> partRecord.getPartId())) .keyBy(

Apache Flink: Where do State Backends keep the state?

馋奶兔 提交于 2019-12-09 14:16:20
问题 I got a statement below: "Depending on your state backend, Flink can also manage the state for the application, meaning Flink deals with the memory management (possibly spilling to disk if necessary) to allow applications to hold very large state." https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/state/state_backends.html Does it mean that only when the state backends is configured to RocksDBStateBackend , the state would keep in memory and possibly spilling to disk if

Why “broadcast state” can store the dynamic rules however broadcast() operator cannot?

久未见 提交于 2019-12-08 12:28:38
问题 I got confused for the difference between "broadcast state" and broadcast() operator, and finally I got the help from a Flink expert in the following thread. What does it mean that "broadcast state" unblocks the implementation of the “dynamic patterns” feature for Flink’s CEP library? In the end it seems got the conclusion that "broadcast state" can store the dynamic rules in the keyed stream by RichCoFlatMap , however broadcast() operator cannot, so may I know how "broadcast state" store the

Calculate totals and emit periodically in flink

血红的双手。 提交于 2019-12-08 09:52:08
问题 I have a stream of events about resources that looks like this: id, type, count 1, view, 1 1, download, 3 2, view, 1 3, view, 1 1, download, 2 3, view, 1 I am trying to produce stats (totals) per resource, so if I get a stream like above, the result should be: id, views, downloads 1, 1, 5 2, 1, 0 3, 2, 0 Now I wrote a ProcessFunction that calculates the totals like this: public class CountTotals extends ProcessFunction<Event, ResourceTotals> { private ValueState<ResourceTotals> totalsState;

Simple Scala API for CEP example don't show any output

旧街凉风 提交于 2019-12-08 05:25:05
问题 I'm programming a simple example for testing the new Scala API for CEP in Flink, using the latest Github version for 1.1-SNAPSHOT. The Pattern is only a check for a value, and outputs a single String as a result for each pattern matched. Code is as follows: val pattern : Pattern[(String, Long, Int), _] = Pattern.begin("start").where(_._3 < 4) val cepEventAlert = CEP.pattern(streamingAlert, pattern) def selectFn(pattern : mutable.Map[String, (String, Long, Int)]): String = { val startEvent =

Apache Flink Using Windows to induce a delay before writing to Sink

廉价感情. 提交于 2019-12-08 05:13:17
问题 I am wondering is possible with Flink windowing to induce a 10 minute delay from when the data enters the pipeline until it is written to a table in Cassandra. My initial intention was to write each transaction to a table in Cassandra and query the table using a range key at the web layer but due to the volume of data, I am looking at options to delay the write for N seconds. This means that my table will only ever have data that is at least 10 minutes old. The small diagram below shows 10

Apache Flink Streaming window WordCount

扶醉桌前 提交于 2019-12-08 03:30:54
问题 I have following code to count words from socketTextStream. Both cumulate word counts and time windowed word counts are needed. The program has an issue that cumulateCounts is always the same as windowed counts. Why this issue occurs? What is the correct way to calculate cumulate counts base on windowed counts? StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); final HashMap<String, Integer> cumulateCounts = new HashMap<String, Integer>(); final DataStream

Apache Flink - how to send and consume POJOs using AWS Kinesis

那年仲夏 提交于 2019-12-08 02:51:51
问题 I want to consume POJOs arriving from Kinesis with Flink. Is there any standard for how to correctly send and deserialize the messages? Thanks 回答1: I resolved it with: DataStream<SamplePojo> kinesis = see.addSource(new FlinkKinesisConsumer<>( "my-stream", new POJODeserializationSchema(), kinesisConsumerConfig)); and public class POJODeserializationSchema extends AbstractDeserializationSchema<SamplePojo> { private ObjectMapper mapper; @Override public SamplePojo deserialize(byte[] message)

FLINK: How to read from multiple kafka cluster using same StreamExecutionEnvironment

半世苍凉 提交于 2019-12-08 02:46:20
问题 I want to read data from multiple KAFKA clusters in FLINK. But the result is that the kafkaMessageStream is reading only from first Kafka. I am able to read from both Kafka clusters only if i have 2 streams separately for both Kafka , which is not what i want. Is it possible to have multiple sources attached to single reader. sample code public class KafkaReader<T> implements Reader<T>{ private StreamExecutionEnvironment executionEnvironment ; public StreamExecutionEnvironment