flink-streaming

How to increase Flink taskmanager.numberOfTaskSlots to run it without Flink server(in IDE or fat jar)

感情迁移 提交于 2019-12-05 16:17:43
I have one questions about running Flink streaming job in IDE or as fat jar without deploying it to Flink server. The problem is I cannot run it in IDE when I have more than 1 taskslot in my job. public class StreamingJob { public static void main(String[] args) throws Exception { // set up the streaming execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties kafkaProperties = new Properties(); kafkaProperties.setProperty("bootstrap.servers", "localhost:9092"); kafkaProperties.setProperty("group.id", "test"); env

Flink CsvTableSource Streaming

孤人 提交于 2019-12-05 13:35:45
I want to stream a csv file and perform sql operations using flink. But the code i have written just reads once and stops. It does not stream. Thanks in advance, StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = StreamTableEnvironment.getTableEnvironment(env); CsvTableSource csvtable = CsvTableSource.builder() .path("D:/employee.csv") .ignoreFirstLine() .fieldDelimiter(",") .field("id", Types.INT()) .field("name", Types.STRING()) .field("designation", Types.STRING()) .field("age", Types.INT()) .field("location", Types

Apache Flink: What's the difference between side outputs and split() in the DataStream API?

ε祈祈猫儿з 提交于 2019-12-05 05:33:39
Apache Flink has a split API that lets to branch data-streams: val splited = datastream.split { i => i match { case i if ... => Seq("red", "blue") case _ => Seq("green") }} splited.select("green").flatMap { .... } It also provides a another approach called Side Output( https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/side_output.html ) that lets you do the same thing! What's the difference between these two way? Do they use from a same lower-level construction? Do they cost the same? When and how we should select one of them? The split operator is part of the DataStream

Flink BucketingSink with Custom AvroParquetWriter create empty file

我们两清 提交于 2019-12-05 03:36:41
问题 I have created a writer for BucketingSink. The sink and writer works without error but when it comes to the writer writing avro genericrecord to parquet, the file was created from in-progress, pending to complete. But the files are empty with 0 bytes. Can anyone tell me what is wrong with the code ? I have tried placing the initialization of AvroParquetWriter at the open() method, but result still the same. When debugging the code, I confirm that writer.write(element) does executed and

Apache Flink - custom java options are not recognized inside job

我是研究僧i 提交于 2019-12-05 03:09:47
I've added the following line to flink-conf.yaml: env.java.opts: "-Ddy.props.path=/PATH/TO/PROPS/FILE" when starting jobmanager (jobmanager.sh start cluster) I see in logs that the jvm option is indeed recognized 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options: 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms256m 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx256m 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -XX:MaxPermSize=256m 2017-02-20 12:19:23,536

Differences between working with states and windows(time) in Flink streaming

这一生的挚爱 提交于 2019-12-04 23:06:19
Let's say we want to compute the sum and average of the items, and can either working with states or windows (time). Example working with windows - https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#example-program Example working with states - https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/ride_speed/RideSpeed.java Can I ask what would be the reasons to make decision? Can I infer that if the data arrives very irregularly (50% comes in the defined window length and

Apache Flink (v1.6.0) authenticate Elasticsearch Sink (v6.4)

空扰寡人 提交于 2019-12-04 20:30:35
I am using Apache Flink v1.6.0 and I am trying to write to Elasticsearch v6.4.0, which is hosted in Elastic Cloud . I am having issue when authenticating to the Elastic Cloud cluster. I have been able to get Flink to write to a local Elasticsearch v6.4.0 node, which does not have encryption using the following code: /* Elasticsearch Configuration */ List<HttpHost> httpHosts = new ArrayList<>(); httpHosts.add(new HttpHost("127.0.0.1", 9200, "http")); // use a ElasticsearchSink.Builder to create an ElasticsearchSink ElasticsearchSink.Builder<ObjectNode> esSinkBuilder = new ElasticsearchSink

How to filter Apache flink stream on the basis of other?

大憨熊 提交于 2019-12-04 16:56:20
I have two stream one is of Int and other is of json .In The json Schema there is one key which is some int .So i need to filter the json stream via key comparison with the other integer stream so Is it possible in Flink? Yes, you can do this kind of stream processing with Flink. The basic building blocks you need from Flink are connected streams, and stateful functions -- here's an example using a RichCoFlatMap: import org.apache.flink.api.common.state.ValueState; import org.apache.flink.api.common.state.ValueStateDescriptor; import org.apache.flink.api.common.typeinfo.TypeHint; import org

Kafka & Flink duplicate messages on restart

馋奶兔 提交于 2019-12-04 07:20:13
First of all, this is very similar to Kafka consuming the latest message again when I rerun the Flink consumer , but it's not the same. The answer to that question does NOT appear to solve my problem. If I missed something in that answer, then please rephrase the answer, as I clearly missed something. The problem is the exact same, though -- Flink (the kafka connector) re-runs the last 3-9 messages it saw before it was shut down. My Versions Flink 1.1.2 Kafka 0.9.0.1 Scala 2.11.7 Java 1.8.0_91 My Code import java.util.Properties import org.apache.flink.streaming.api.windowing.time.Time import

How to count unique words in a stream?

做~自己de王妃 提交于 2019-12-04 00:18:21
问题 Is there a way to count the number of unique words in a stream with Flink Streaming? The results would be a stream of number which keeps increasing. 回答1: You can solve the problem by storing all words which you've already seen. Having this knowledge you can filter out all duplicate words. The rest can then be counted by a map operator with parallelism 1 . The following code snippet does exactly that. val env = StreamExecutionEnvironment.getExecutionEnvironment val inputStream = env