Trouble reading avro files in Jupyter notebook using pyspark

邮差的信 提交于 2021-01-29 08:51:02

问题


I am trying to read an avro file in Jupyter notebook using pyspark. When I read the file i am getting an error.

I have downloaded spark-avro_2.11:4.0.0.jar, i am not sure where in my code I should be inserting the avro package. Any suggestions would be great.

This is an example of the code I am using to read the avro file

df_avro_example = sqlContext.read.format("com.databricks.spark.avro").load("example_file.avro")

This is the error I get

AnalysisException: 'Failed to find data source: com.databricks.spark.avro. Please find an Avro package at http://spark.apache.org/third-party-projects.html;'


回答1:


download the jar to a location and use the following code snippet in your pyspark app

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /path/tojar/spark-avro_2.11:4.0.0.jar pyspark-shell' 


来源:https://stackoverflow.com/questions/56618866/trouble-reading-avro-files-in-jupyter-notebook-using-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!