Methods for writing Parquet files using Python?

后端 未结 6 719
再見小時候
再見小時候 2021-02-02 09:30

I\'m having trouble finding a library that allows Parquet files to be written using Python. Bonus points if I can use Snappy or a similar compression mechanism in conjunction wi

6条回答
  •  醉酒成梦
    2021-02-02 10:03

    Simple method to write pandas dataframe to parquet.

    Assuming, df is the pandas dataframe. We need to import following libraries.

    import pyarrow as pa
    import pyarrow.parquet as pq
    

    First, write the datafrmae df into a pyarrow table.

    # Convert DataFrame to Apache Arrow Table
    table = pa.Table.from_pandas(df_image_0)
    

    Second, write the table into paraquet file say file_name.paraquet

    # Parquet with Brotli compression
    pq.write_table(table, 'file_name.paraquet')
    

    NOTE: paraquet files can be further compressed while writing. Following are the popular compression formats.

    • Snappy ( default, requires no argument)
    • gzip
    • brotli

    Parquet with Snappy compression

     pq.write_table(table, 'file_name.paraquet')
    

    Parquet with GZIP compression

    pq.write_table(table, 'file_name.paraquet', compression='GZIP')
    

    Parquet with Brotli compression

    pq.write_table(table, 'file_name.paraquet', compression='BROTLI')
    

    Comparative comparision achieved with different formats of paraquet

    Reference: https://tech.jda.com/efficient-dataframe-storage-with-apache-parquet/

提交回复
热议问题