flink-streaming

Real-Time streaming prediction in Flink using scala

允我心安 提交于 2019-12-07 09:22:41
问题 Flink version : 1.2.0 Scala version : 2.11.8 I want to use a DataStream to predict using a model in flink using scala. I have a DataStream[String] in flink using scala which contains json formatted data from a kafka source.I want to use this datastream to predict on a Flink-ml model which is already trained. The problem is all the flink-ml examples use DataSet api to predict. I am relatively new to flink and scala so any help in the form of a code solution would be appreciated. Input : {

Flink CsvTableSource Streaming

允我心安 提交于 2019-12-07 09:21:18
问题 I want to stream a csv file and perform sql operations using flink. But the code i have written just reads once and stops. It does not stream. Thanks in advance, StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = StreamTableEnvironment.getTableEnvironment(env); CsvTableSource csvtable = CsvTableSource.builder() .path("D:/employee.csv") .ignoreFirstLine() .fieldDelimiter(",") .field("id", Types.INT()) .field("name", Types

Apache Flink: What's the difference between side outputs and split() in the DataStream API?

时间秒杀一切 提交于 2019-12-07 00:08:14
问题 Apache Flink has a split API that lets to branch data-streams: val splited = datastream.split { i => i match { case i if ... => Seq("red", "blue") case _ => Seq("green") }} splited.select("green").flatMap { .... } It also provides a another approach called Side Output( https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/side_output.html) that lets you do the same thing! What's the difference between these two way? Do they use from a same lower-level construction? Do they

Apache Flink - custom java options are not recognized inside job

可紊 提交于 2019-12-06 22:22:36
问题 I've added the following line to flink-conf.yaml: env.java.opts: "-Ddy.props.path=/PATH/TO/PROPS/FILE" when starting jobmanager (jobmanager.sh start cluster) I see in logs that the jvm option is indeed recognized 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options: 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms256m 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx256m 2017-02-20 12:19:23

Differences between working with states and windows(time) in Flink streaming

为君一笑 提交于 2019-12-06 16:09:04
问题 Let's say we want to compute the sum and average of the items, and can either working with states or windows (time). Example working with windows - https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#example-program Example working with states - https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/ride_speed/RideSpeed.java Can I ask what would be the reasons to make

How to filter Apache flink stream on the basis of other?

天涯浪子 提交于 2019-12-06 12:00:24
问题 I have two stream one is of Int and other is of json .In The json Schema there is one key which is some int .So i need to filter the json stream via key comparison with the other integer stream so Is it possible in Flink? 回答1: Yes, you can do this kind of stream processing with Flink. The basic building blocks you need from Flink are connected streams, and stateful functions -- here's an example using a RichCoFlatMap: import org.apache.flink.api.common.state.ValueState; import org.apache

FLINK: How to read from multiple kafka cluster using same StreamExecutionEnvironment

故事扮演 提交于 2019-12-06 08:19:29
I want to read data from multiple KAFKA clusters in FLINK. But the result is that the kafkaMessageStream is reading only from first Kafka. I am able to read from both Kafka clusters only if i have 2 streams separately for both Kafka , which is not what i want. Is it possible to have multiple sources attached to single reader. sample code public class KafkaReader<T> implements Reader<T>{ private StreamExecutionEnvironment executionEnvironment ; public StreamExecutionEnvironment getExecutionEnvironment(Properties properties){ executionEnvironment = StreamExecutionEnvironment

Apache Flink - how to send and consume POJOs using AWS Kinesis

十年热恋 提交于 2019-12-06 07:34:30
I want to consume POJOs arriving from Kinesis with Flink. Is there any standard for how to correctly send and deserialize the messages? Thanks I resolved it with: DataStream<SamplePojo> kinesis = see.addSource(new FlinkKinesisConsumer<>( "my-stream", new POJODeserializationSchema(), kinesisConsumerConfig)); and public class POJODeserializationSchema extends AbstractDeserializationSchema<SamplePojo> { private ObjectMapper mapper; @Override public SamplePojo deserialize(byte[] message) throws IOException { if (mapper == null) { mapper = new ObjectMapper(); } SamplePojo retVal = mapper.readValue

Kafka & Flink duplicate messages on restart

淺唱寂寞╮ 提交于 2019-12-06 03:44:21
问题 First of all, this is very similar to Kafka consuming the latest message again when I rerun the Flink consumer, but it's not the same. The answer to that question does NOT appear to solve my problem. If I missed something in that answer, then please rephrase the answer, as I clearly missed something. The problem is the exact same, though -- Flink (the kafka connector) re-runs the last 3-9 messages it saw before it was shut down. My Versions Flink 1.1.2 Kafka 0.9.0.1 Scala 2.11.7 Java 1.8.0_91

TaskManager was lost/killed

回眸只為那壹抹淺笑 提交于 2019-12-05 19:08:26
When I am trying to run the flink job in standalone cluster I get this error: java.lang.Exception: TaskManager was lost/killed: ResourceID{resourceId='2961948b9ac490c11c6e41b0ec197e9f'} @ localhost (dataPort=55795) at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217) at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533) at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192) at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167) at org.apache.flink.runtime