generating an AVRO schema from a JSON document

后端 未结 1 818
不知归路
不知归路 2020-12-30 01:25

Is there any tool able to create an AVRO schema from a \'typical\' JSON document.

For example:

{
\"records\":[{\"name\":\"X1\",\"age\":2},{\"name\":\         


        
相关标签:
1条回答
  • 2020-12-30 01:47

    You can achieve that easily using Apache Spark and python. First download the spark distribution from http://spark.apache.org/downloads.html, then install avro package for python using pip. Then run pyspark with avro package:

    ./bin/pyspark --packages com.databricks:spark-avro_2.11:3.1.0
    

    and use the following code (assuming the input.json files contains one or more json documents, each in separate line):

    import os, avro.datafile
    
    spark.read.json('input.json').coalesce(1).write.format("com.databricks.spark.avro").save("output.avro")
    avrofile = filter(lambda file: file.startswith('part-r-00000'), os.listdir('output.avro'))[0]
    
    with open('output.avro/' + avrofile) as avrofile:
        reader = avro.datafile.DataFileReader(avrofile, avro.io.DatumReader())
        print(reader.datum_reader.writers_schema)
    

    For example: for input file with content:

    {'string': 'somestring', 'number': 3.14, 'structure': {'integer': 13}}
    {'string': 'somestring2', 'structure': {'integer': 14}}
    

    The script will result in:

    {"fields": [{"type": ["double", "null"], "name": "number"}, {"type": ["string", "null"], "name": "string"}, {"type": [{"type": "record", "namespace": "", "name": "structure", "fields": [{"type": ["long", "null"], "name": "integer"}]}, "null"], "name": "structure"}], "type": "record", "name": "topLevelRecord"}
    
    0 讨论(0)
提交回复
热议问题