flink-batch

Create Input Format of Elasticsearch using Flink Rich InputFormat

感情迁移 提交于 2021-01-29 07:06:09
问题 We are using Elasticsearch 6.8.4 and Flink 1.0.18. We have an index with 1 shard and 1 replica in elasticsearch and I want to create the custom input format to read and write data in elasticsearch using apache Flink dataset API with more than 1 input splits in order to achieve better performance. so is there any way I can achieve this requirement? Note: Per document size is larger(almost 8mb) and I can read only 10 documents at a time because of size constraint and per reading request, we

I want to write ORC file using Flink's Streaming File Sink but it doesn’t write files correctly

谁说我不能喝 提交于 2020-07-20 03:48:09
问题 I am reading data from Kafka and trying to write it to the HDFS file system in ORC format. I have used the below link reference from their official website. But I can see that Flink write exact same content for all data and make so many files and all files are ok 103KB https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/streamfile_sink.html#orc-format Please find my code below. object BeaconBatchIngest extends StreamingBase { val env: StreamExecutionEnvironment =

Apache Flink - Dataset api - Side outputs

被刻印的时光 ゝ 提交于 2020-03-25 03:16:50
问题 Does Flink supports Side Outputs feature in Dataset(Batch Api) ? If not, how to handle valid and invalid records when loading from file ? 回答1: You can always do something like this: DataSet<EventOrInvalidRecord> goodAndBadTogether = input.map(new CreateObjectIfPossible()) goodAndBadTogether.filter(new KeepOnlyGood())... goodAndBadTogether.filter(new KeepOnlyBad())... Another reasonable option in some cases is to go ahead and use the DataStream API, even if you don't have streaming sources. 来源