flink-streaming | 易学教程

How to increase Flink taskmanager.numberOfTaskSlots to run it without Flink server(in IDE or fat jar)

阅读更多关于 How to increase Flink taskmanager.numberOfTaskSlots to run it without Flink server(in IDE or fat jar)

I have one questions about running Flink streaming job in IDE or as fat jar without deploying it to Flink server. The problem is I cannot run it in IDE when I have more than 1 taskslot in my job. public class StreamingJob { public static void main(String[] args) throws Exception { // set up the streaming execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties kafkaProperties = new Properties(); kafkaProperties.setProperty("bootstrap.servers", "localhost:9092"); kafkaProperties.setProperty("group.id", "test"); env

Flink CsvTableSource Streaming

阅读更多关于 Flink CsvTableSource Streaming

I want to stream a csv file and perform sql operations using flink. But the code i have written just reads once and stops. It does not stream. Thanks in advance, StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = StreamTableEnvironment.getTableEnvironment(env); CsvTableSource csvtable = CsvTableSource.builder() .path("D:/employee.csv") .ignoreFirstLine() .fieldDelimiter(",") .field("id", Types.INT()) .field("name", Types.STRING()) .field("designation", Types.STRING()) .field("age", Types.INT()) .field("location", Types

Apache Flink: What's the difference between side outputs and split() in the DataStream API?

阅读更多关于 Apache Flink: What's the difference between side outputs and split() in the DataStream API?

Apache Flink has a split API that lets to branch data-streams: val splited = datastream.split { i => i match { case i if ... => Seq("red", "blue") case _ => Seq("green") }} splited.select("green").flatMap { .... } It also provides a another approach called Side Output( https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/side_output.html ) that lets you do the same thing! What's the difference between these two way? Do they use from a same lower-level construction? Do they cost the same? When and how we should select one of them? The split operator is part of the DataStream

Flink BucketingSink with Custom AvroParquetWriter create empty file

阅读更多关于 Flink BucketingSink with Custom AvroParquetWriter create empty file

问题 I have created a writer for BucketingSink. The sink and writer works without error but when it comes to the writer writing avro genericrecord to parquet, the file was created from in-progress, pending to complete. But the files are empty with 0 bytes. Can anyone tell me what is wrong with the code ? I have tried placing the initialization of AvroParquetWriter at the open() method, but result still the same. When debugging the code, I confirm that writer.write(element) does executed and

Apache Flink - custom java options are not recognized inside job

阅读更多关于 Apache Flink - custom java options are not recognized inside job

I've added the following line to flink-conf.yaml: env.java.opts: "-Ddy.props.path=/PATH/TO/PROPS/FILE" when starting jobmanager (jobmanager.sh start cluster) I see in logs that the jvm option is indeed recognized 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options: 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms256m 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx256m 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -XX:MaxPermSize=256m 2017-02-20 12:19:23,536

Differences between working with states and windows(time) in Flink streaming

阅读更多关于 Differences between working with states and windows(time) in Flink streaming

Let's say we want to compute the sum and average of the items, and can either working with states or windows (time). Example working with windows - https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#example-program Example working with states - https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/ride_speed/RideSpeed.java Can I ask what would be the reasons to make decision? Can I infer that if the data arrives very irregularly (50% comes in the defined window length and

Apache Flink (v1.6.0) authenticate Elasticsearch Sink (v6.4)

阅读更多关于 Apache Flink (v1.6.0) authenticate Elasticsearch Sink (v6.4)

I am using Apache Flink v1.6.0 and I am trying to write to Elasticsearch v6.4.0, which is hosted in Elastic Cloud . I am having issue when authenticating to the Elastic Cloud cluster. I have been able to get Flink to write to a local Elasticsearch v6.4.0 node, which does not have encryption using the following code: /* Elasticsearch Configuration */ List<HttpHost> httpHosts = new ArrayList<>(); httpHosts.add(new HttpHost("127.0.0.1", 9200, "http")); // use a ElasticsearchSink.Builder to create an ElasticsearchSink ElasticsearchSink.Builder<ObjectNode> esSinkBuilder = new ElasticsearchSink

How to filter Apache flink stream on the basis of other?

阅读更多关于 How to filter Apache flink stream on the basis of other?

I have two stream one is of Int and other is of json .In The json Schema there is one key which is some int .So i need to filter the json stream via key comparison with the other integer stream so Is it possible in Flink? Yes, you can do this kind of stream processing with Flink. The basic building blocks you need from Flink are connected streams, and stateful functions -- here's an example using a RichCoFlatMap: import org.apache.flink.api.common.state.ValueState; import org.apache.flink.api.common.state.ValueStateDescriptor; import org.apache.flink.api.common.typeinfo.TypeHint; import org

Kafka & Flink duplicate messages on restart

阅读更多关于 Kafka & Flink duplicate messages on restart

First of all, this is very similar to Kafka consuming the latest message again when I rerun the Flink consumer , but it's not the same. The answer to that question does NOT appear to solve my problem. If I missed something in that answer, then please rephrase the answer, as I clearly missed something. The problem is the exact same, though -- Flink (the kafka connector) re-runs the last 3-9 messages it saw before it was shut down. My Versions Flink 1.1.2 Kafka 0.9.0.1 Scala 2.11.7 Java 1.8.0_91 My Code import java.util.Properties import org.apache.flink.streaming.api.windowing.time.Time import

How to count unique words in a stream?

阅读更多关于 How to count unique words in a stream?

问题 Is there a way to count the number of unique words in a stream with Flink Streaming? The results would be a stream of number which keeps increasing. 回答1: You can solve the problem by storing all words which you've already seen. Having this knowledge you can filter out all duplicate words. The rest can then be counted by a map operator with parallelism 1 . The following code snippet does exactly that. val env = StreamExecutionEnvironment.getExecutionEnvironment val inputStream = env