发表新帖

发表新帖

Get schema of parquet file in Python

前端未结

关注

 5  1057

误落风尘 2021-02-10 07:23

Is there any python library that can be used to just get the schema of a parquet file?

Currently we are loading the parquet file into dataframe in Spark and getting schem

5条回答

别那么骄傲 (楼主)

2021-02-10 07:35
This is supported by using pyarrow (https://github.com/apache/arrow/).
```
from pyarrow.parquet import ParquetFile
# Source is either the filename or an Arrow file handle (which could be on HDFS)
ParquetFile(source).metadata
```
Note: We merged the code for this only yesterday, so you need to build it from source, see https://github.com/apache/arrow/commit/f44b6a3b91a15461804dd7877840a557caa52e4e
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题