Is there any python library that can be used to just get the schema of a parquet file?
Currently we are loading the parquet file into dataframe in Spark and getting schem
This is supported by using pyarrow
(https://github.com/apache/arrow/).
from pyarrow.parquet import ParquetFile
# Source is either the filename or an Arrow file handle (which could be on HDFS)
ParquetFile(source).metadata
Note: We merged the code for this only yesterday, so you need to build it from source, see https://github.com/apache/arrow/commit/f44b6a3b91a15461804dd7877840a557caa52e4e