Read AVRO file using Python

邮差的信 提交于 2021-01-27 07:38:41

问题


I have an AVRO file(created by JAVA) and seems like it is some kind of zipped file for hadoop/mapreduce, i want to 'unzip' (deserialize) it to a flat file. Per record per row.

I learned that there is an AVRO package for python, and I installed it correctly. And run the example to read the AVRO file. However, it came up with the errors below and I am wondering what is going on reading the simplest example? Can anyone help me interpret the errors bellow.

>>> reader = DataFileReader(open("/tmp/Stock_20130812104524.avro", "r"), DatumReader())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../python2.7/site-packages/avro/datafile.py", line 240, in __init__
    raise DataFileException('Unknown codec: %s.' % self.codec)
avro.datafile.DataFileException: Unknown codec: snappy.

btw, if I do 'head' of file, and using VI to open up the first few lines of the AVRO file, I could see the schema definition together with some crappy weird characters - probably the zipped content. The starting bit of the raw AVRO file looks like below:

bj^A^D^Tavro.codec^Lsnappy^Vavro.schemaØ${"type":"record","name":"Stoc...

I don't know if those schemas would be necessary to read the AVRO file, something like below:

schema = avro.schema.parse(open("schema").read())
# include schema to do sth...
reader = DataFileReader(open("Stock_20130812104524.avro", "r"), DatumReader())

Thanks in advance.


回答1:


The problem is that if there is no Xcode command line tools installed you cannot get snappy working. You can check by typing gcc at the command prompt to see if it is installed or not. If not then type xcode-select –-install to install it. Then installing python-snappy should work. Thanks Bin!




回答2:


Try pip install python-snappy - make sure you have installed snappy first.




回答3:


wget http://www.us.apache.org/dist/avro/avro-1.7.5/java/avro-tools-1.7.5.jar

java -jar avro/avro-tools-1.7.5.jar tojson input.avro > input

More information refers here



来源:https://stackoverflow.com/questions/18453026/read-avro-file-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!