Python spark Dataframe to Elasticsearch

百般思念 提交于 2019-12-05 18:09:27
zero323

For starters saveAsNewAPIHadoopFile expects a RDD of (key, value) pairs and in your case this may happen only accidentally. The same thing applies to the value format you declare.

I am not familiar with Elastic but just based on the arguments you should probably try something similar to this:

kpi1.rdd.map(lambda row: (None, row.asDict()).saveAsNewAPIHadoopFile(...)

Since Elastic-Hadoop provide SQL Data Source you should be also able to skip that and save data directly:

df.write.format("org.elasticsearch.spark.sql").save(...)

As zero323 said, the easiest way to load a Dataframe from PySpark to Elasticsearch is with the method

Dataframe.write.format("org.elasticsearch.spark.sql").save("index/type")  
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!