flink-streaming | 易学教程

Real-Time streaming prediction in Flink using scala

阅读更多关于 Real-Time streaming prediction in Flink using scala

问题 Flink version : 1.2.0 Scala version : 2.11.8 I want to use a DataStream to predict using a model in flink using scala. I have a DataStream[String] in flink using scala which contains json formatted data from a kafka source.I want to use this datastream to predict on a Flink-ml model which is already trained. The problem is all the flink-ml examples use DataSet api to predict. I am relatively new to flink and scala so any help in the form of a code solution would be appreciated. Input : {

Flink CsvTableSource Streaming

阅读更多关于 Flink CsvTableSource Streaming

问题 I want to stream a csv file and perform sql operations using flink. But the code i have written just reads once and stops. It does not stream. Thanks in advance, StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tableEnv = StreamTableEnvironment.getTableEnvironment(env); CsvTableSource csvtable = CsvTableSource.builder() .path("D:/employee.csv") .ignoreFirstLine() .fieldDelimiter(",") .field("id", Types.INT()) .field("name", Types

Apache Flink: What's the difference between side outputs and split() in the DataStream API?

阅读更多关于 Apache Flink: What's the difference between side outputs and split() in the DataStream API?

问题 Apache Flink has a split API that lets to branch data-streams: val splited = datastream.split { i => i match { case i if ... => Seq("red", "blue") case _ => Seq("green") }} splited.select("green").flatMap { .... } It also provides a another approach called Side Output( https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/side_output.html) that lets you do the same thing! What's the difference between these two way? Do they use from a same lower-level construction? Do they

Apache Flink - custom java options are not recognized inside job

阅读更多关于 Apache Flink - custom java options are not recognized inside job

问题 I've added the following line to flink-conf.yaml: env.java.opts: "-Ddy.props.path=/PATH/TO/PROPS/FILE" when starting jobmanager (jobmanager.sh start cluster) I see in logs that the jvm option is indeed recognized 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - JVM Options: 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xms256m 2017-02-20 12:19:23,536 INFO org.apache.flink.runtime.jobmanager.JobManager - -Xmx256m 2017-02-20 12:19:23

Differences between working with states and windows(time) in Flink streaming

阅读更多关于 Differences between working with states and windows(time) in Flink streaming

问题 Let's say we want to compute the sum and average of the items, and can either working with states or windows (time). Example working with windows - https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#example-program Example working with states - https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/ride_speed/RideSpeed.java Can I ask what would be the reasons to make

How to filter Apache flink stream on the basis of other?

阅读更多关于 How to filter Apache flink stream on the basis of other?

问题 I have two stream one is of Int and other is of json .In The json Schema there is one key which is some int .So i need to filter the json stream via key comparison with the other integer stream so Is it possible in Flink? 回答1: Yes, you can do this kind of stream processing with Flink. The basic building blocks you need from Flink are connected streams, and stateful functions -- here's an example using a RichCoFlatMap: import org.apache.flink.api.common.state.ValueState; import org.apache

FLINK: How to read from multiple kafka cluster using same StreamExecutionEnvironment

阅读更多关于 FLINK: How to read from multiple kafka cluster using same StreamExecutionEnvironment

I want to read data from multiple KAFKA clusters in FLINK. But the result is that the kafkaMessageStream is reading only from first Kafka. I am able to read from both Kafka clusters only if i have 2 streams separately for both Kafka , which is not what i want. Is it possible to have multiple sources attached to single reader. sample code public class KafkaReader<T> implements Reader<T>{ private StreamExecutionEnvironment executionEnvironment ; public StreamExecutionEnvironment getExecutionEnvironment(Properties properties){ executionEnvironment = StreamExecutionEnvironment

Apache Flink - how to send and consume POJOs using AWS Kinesis

阅读更多关于 Apache Flink - how to send and consume POJOs using AWS Kinesis

I want to consume POJOs arriving from Kinesis with Flink. Is there any standard for how to correctly send and deserialize the messages? Thanks I resolved it with: DataStream<SamplePojo> kinesis = see.addSource(new FlinkKinesisConsumer<>( "my-stream", new POJODeserializationSchema(), kinesisConsumerConfig)); and public class POJODeserializationSchema extends AbstractDeserializationSchema<SamplePojo> { private ObjectMapper mapper; @Override public SamplePojo deserialize(byte[] message) throws IOException { if (mapper == null) { mapper = new ObjectMapper(); } SamplePojo retVal = mapper.readValue

Kafka & Flink duplicate messages on restart

阅读更多关于 Kafka & Flink duplicate messages on restart

问题 First of all, this is very similar to Kafka consuming the latest message again when I rerun the Flink consumer, but it's not the same. The answer to that question does NOT appear to solve my problem. If I missed something in that answer, then please rephrase the answer, as I clearly missed something. The problem is the exact same, though -- Flink (the kafka connector) re-runs the last 3-9 messages it saw before it was shut down. My Versions Flink 1.1.2 Kafka 0.9.0.1 Scala 2.11.7 Java 1.8.0_91

TaskManager was lost/killed

阅读更多关于 TaskManager was lost/killed

When I am trying to run the flink job in standalone cluster I get this error: java.lang.Exception: TaskManager was lost/killed: ResourceID{resourceId='2961948b9ac490c11c6e41b0ec197e9f'} @ localhost (dataPort=55795) at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:217) at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533) at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:192) at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167) at org.apache.flink.runtime