Pandas cannot read parquet files created in PySpark

前端 未结 3 671
自闭症患者
自闭症患者 2021-01-12 16:43

I am writing a parquet file from a Spark DataFrame the following way:

df.write.parquet(\"path/myfile.parquet\", mode = \"overwrite\", compression=\"gzip\")
<         


        
3条回答
  •  囚心锁ツ
    2021-01-12 16:59

    If the parquet file has been created with spark, (so it's a directory) to import it to pandas use

    from pyarrow.parquet import ParquetDataset
    
    dataset = ParquetDataset("file.parquet")
    table = dataset.read()
    df = table.to_pandas()
    

提交回复
热议问题