apache-flink

I want to write ORC file using Flink's Streaming File Sink but it doesn’t write files correctly

谁说我不能喝 提交于 2020-07-20 03:48:09
问题 I am reading data from Kafka and trying to write it to the HDFS file system in ORC format. I have used the below link reference from their official website. But I can see that Flink write exact same content for all data and make so many files and all files are ok 103KB https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/streamfile_sink.html#orc-format Please find my code below. object BeaconBatchIngest extends StreamingBase { val env: StreamExecutionEnvironment =

Flink stream not finishing

江枫思渺然 提交于 2020-07-09 07:42:40
问题 I am settings up a flink stream processor using kafka and elasticsearch. I want to replay my data, but when i set the parallelism to more than 1, It does not finish the program I believe this to be because that only one message is seen by the kafka stream to be identified as the end of the stream. public CustomSchema(Date _endTime) { endTime = _endTime; } @Override public boolean isEndOfStream(CustomTopicWrapper nextElement) { if (this.endTime != null && nextElement.messageTime.getTime() >=

Flink stream not finishing

余生颓废 提交于 2020-07-09 07:42:32
问题 I am settings up a flink stream processor using kafka and elasticsearch. I want to replay my data, but when i set the parallelism to more than 1, It does not finish the program I believe this to be because that only one message is seen by the kafka stream to be identified as the end of the stream. public CustomSchema(Date _endTime) { endTime = _endTime; } @Override public boolean isEndOfStream(CustomTopicWrapper nextElement) { if (this.endTime != null && nextElement.messageTime.getTime() >=

Consume from two flink dataStream based on priority or round robin way

怎甘沉沦 提交于 2020-06-28 08:39:49
问题 I have two flink dataStream . For ex: dataStream1 and dataStream2 . I want to union both the Streams into 1 stream so that I can process them using the same process functions as the dag of both dataStream is the same. As of now, I need equal priority of consumption of messages for either stream. The producer of dataStream2 produces 10 messages per minute, while the producer of dataStream1 produces 1000 messages per second. Also, dataTypes are the same for both dataStreams.DataSteam2 more of a

How do I join two streams in apache flink?

自闭症网瘾萝莉.ら 提交于 2020-06-28 02:11:01
问题 I am getting started with flink and having a look at one of the official tutorials. To my understanding the goal of this exercise is to join the two streams on the time attribute. Task: The result of this exercise is a data stream of Tuple2 records, one for each distinct rideId. You should ignore the END events, and only join the event for the START of each ride with its corresponding fare data. The resulting stream should be printed to standard out. Question: How is the EnrichmentFunction

How do I join two streams in apache flink?

試著忘記壹切 提交于 2020-06-28 02:09:03
问题 I am getting started with flink and having a look at one of the official tutorials. To my understanding the goal of this exercise is to join the two streams on the time attribute. Task: The result of this exercise is a data stream of Tuple2 records, one for each distinct rideId. You should ignore the END events, and only join the event for the START of each ride with its corresponding fare data. The resulting stream should be printed to standard out. Question: How is the EnrichmentFunction

Apache Flink: Cannot find compatible factory for specified execution.target (=local)

喜你入骨 提交于 2020-06-27 20:47:51
问题 I've decided to experiment with apache flink a bit. I decided to use scala console (or more precisely http://ammonite.io/) to read some stuff from csv file and print it locally... just to debug end experiments. import $ivy.`org.apache.flink:flink-csv:1.10.0` import $ivy.`org.apache.flink::flink-scala:1.10.0` import org.apache.flink.api.scala._ import org.apache.flink.api.scala.extensions._ val env = ExecutionEnvironment.createLocalEnvironment() val lines = env.readCsvFile[(String, String,

Apache Flink: Cannot find compatible factory for specified execution.target (=local)

本小妞迷上赌 提交于 2020-06-27 20:45:47
问题 I've decided to experiment with apache flink a bit. I decided to use scala console (or more precisely http://ammonite.io/) to read some stuff from csv file and print it locally... just to debug end experiments. import $ivy.`org.apache.flink:flink-csv:1.10.0` import $ivy.`org.apache.flink::flink-scala:1.10.0` import org.apache.flink.api.scala._ import org.apache.flink.api.scala.extensions._ val env = ExecutionEnvironment.createLocalEnvironment() val lines = env.readCsvFile[(String, String,

Simple hello world example for Flink

非 Y 不嫁゛ 提交于 2020-06-27 06:06:13
问题 I am looking for the simplest possible example of an hello-world experience with Apache flink. Assume I have just installed flink on a clean box, what is the bare minimum I would need to do to 'make it do something'. I realize this is quite vague, here are some examples. Three python examples from the terminal: python -c "print('hello world')" python hello_world.py python python -c "print(1+1)" Of course a streaming application is a bit more complicated, but here is something similar that I

Apache Flink: Could not extract key from ObjectNode::get

给你一囗甜甜゛ 提交于 2020-06-17 15:50:26
问题 I'm using Flink to process the data coming from some data source (such as Kafka, Pravega etc). In my case, the data source is Pravega, which provided me a flink connector. My data source is sending me some JSON data as below: {"device":"rand-numeric","id":"b4728895-741f-466a-b87b-79c7590893b4","origin":"1591095418904441036","readings":[{"origin":"1591095418904328442","valueType":"Int64","name":"int","device":"rand-numeric","value":"0"}]} Here is my piece of code: import org.apache.flink