Input data to AWS Elastic Search using Glue

被刻印的时光 ゝ 提交于 2021-01-28 09:03:01

问题


I'm looking for a solution to insert data to AWS Elastic Search using AWS Glue python or pyspark. I have seen Boto3 SDK for Elastic Search but could not find any function to insert data into Elastic Search. Can anyone help me to find solution ? Any useful links or code ?


回答1:


For aws glue you need to add an additional jar to the job.

  1. Download the jar from https://repo1.maven.org/maven2/org/elasticsearch/elasticsearch-hadoop/7.8.0/elasticsearch-hadoop-7.8.0.jar
  2. Save the jar on s3 and pass it to the glue job.
  3. Now while saving the dataframe use following
df.write.format("org.elasticsearch.spark.sql").\
         option("es.resource", "index/document").\
         option("es.nodes", host).\
         option("es.port", port).\
         save()

If you are using aws managed elastic search, try setting this to true

option("es.nodes.wan.only", "true")

For more properties check https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html

NOTE The elasticsearch-spark connector is compatible with spark 2.3 only as it is prebuilt on scala 2.11 while spark 2.4 and spark 3.0 is prebuilt on scala 2.12



来源:https://stackoverflow.com/questions/62829791/input-data-to-aws-elastic-search-using-glue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!