Create Input Format of Elasticsearch using Flink Rich InputFormat

感情迁移 提交于 2021-01-29 07:06:09

问题


We are using Elasticsearch 6.8.4 and Flink 1.0.18.

We have an index with 1 shard and 1 replica in elasticsearch and I want to create the custom input format to read and write data in elasticsearch using apache Flink dataset API with more than 1 input splits in order to achieve better performance. so is there any way I can achieve this requirement?

Note: Per document size is larger(almost 8mb) and I can read only 10 documents at a time because of size constraint and per reading request, we want to retrieve 500k records.

As per my understanding, no.of parallelism should be equal to number of shards/partitions of the data source. however, since we store only a small amount of data we have kept the number of shard as only 1 and we have a static data it gets increased very slightly per month.

Any help or example of source code will be much appreciated.


回答1:


You need to be able to generate queries to ES that effectively partition your source data into relatively equal chunks. Then you can run your input source with a parallelism > 1, and have each sub-task read only part of the index data.



来源:https://stackoverflow.com/questions/63747019/create-input-format-of-elasticsearch-using-flink-rich-inputformat

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!