Get schema of parquet file in Python

前端 未结 5 1055
误落风尘
误落风尘 2021-02-10 07:23

Is there any python library that can be used to just get the schema of a parquet file?

Currently we are loading the parquet file into dataframe in Spark and getting schem

5条回答
  •  青春惊慌失措
    2021-02-10 07:48

    In addition to the answer by @mehdio, in case your parquet is a directory (e.g. a parquet generated by spark), to read the schema / column names:

    import pyarrow.parquet as pq
    pfile = pq.read_table("file.parquet")
    print("Column names: {}".format(pfile.column_names))
    print("Schema: {}".format(pfile.schema))
    

提交回复
热议问题