I\'m having trouble finding a library that allows Parquet files to be written using Python. Bonus points if I can use Snappy or a similar compression mechanism in conjunction wi
Assuming, df
is the pandas dataframe. We need to import following libraries.
import pyarrow as pa
import pyarrow.parquet as pq
First, write the datafrmae df
into a pyarrow
table.
# Convert DataFrame to Apache Arrow Table
table = pa.Table.from_pandas(df_image_0)
Second, write the table
into paraquet
file say file_name.paraquet
# Parquet with Brotli compression
pq.write_table(table, 'file_name.paraquet')
Parquet with Snappy compression
pq.write_table(table, 'file_name.paraquet')
Parquet with GZIP compression
pq.write_table(table, 'file_name.paraquet', compression='GZIP')
Parquet with Brotli compression
pq.write_table(table, 'file_name.paraquet', compression='BROTLI')
Reference: https://tech.jda.com/efficient-dataframe-storage-with-apache-parquet/