Get schema of parquet file in Python

前端 未结 5 1049
误落风尘
误落风尘 2021-02-10 07:23

Is there any python library that can be used to just get the schema of a parquet file?

Currently we are loading the parquet file into dataframe in Spark and getting schem

5条回答
  •  别那么骄傲
    2021-02-10 07:35

    This is supported by using pyarrow (https://github.com/apache/arrow/).

    from pyarrow.parquet import ParquetFile
    # Source is either the filename or an Arrow file handle (which could be on HDFS)
    ParquetFile(source).metadata
    

    Note: We merged the code for this only yesterday, so you need to build it from source, see https://github.com/apache/arrow/commit/f44b6a3b91a15461804dd7877840a557caa52e4e

提交回复
热议问题