I\'m having trouble finding a library that allows Parquet files to be written using Python. Bonus points if I can use Snappy or a similar compression mechanism in conjunction wi
using fastparquet
you can write a pandas df
to parquet either withsnappy
or gzip
compression as follows:
make sure you have installed the following:
$ conda install python-snappy
$ conda install fastparquet
do imports
import pandas as pd
import snappy
import fastparquet
assume you have the following pandas df
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
send df
to parquet with snappy
compression
df.to_parquet('df.snap.parquet',compression='snappy')
send df
to parquet with gzip
compression
df.to_parquet('df.gzip.parquet',compression='gzip')
check:
read parquet back into pandas df
pd.read_parquet('df.snap.parquet')
or
pd.read_parquet('df.gzip.parquet')
output:
col1 col2
0 1 3
1 2 4