Get schema of parquet file in Python

前端 未结 5 1050
误落风尘
误落风尘 2021-02-10 07:23

Is there any python library that can be used to just get the schema of a parquet file?

Currently we are loading the parquet file into dataframe in Spark and getting schem

5条回答
  •  无人及你
    2021-02-10 07:26

    There's now an easiest way with the read_schema method. Note that it returns actually a dict where your schema is a bytes literal, so you need an extra step to convert your schema into a proper python dict.

    from pyarrow.parquet import read_schema
    import json
    
    schema = read_schema(source)
    schema_dict = json.loads(schema.metadata[b'org.apache.spark.sql.parquet.row.metadata'])['fields']
    

提交回复
热议问题