Methods for writing Parquet files using Python?

后端 未结 6 721
再見小時候
再見小時候 2021-02-02 09:30

I\'m having trouble finding a library that allows Parquet files to be written using Python. Bonus points if I can use Snappy or a similar compression mechanism in conjunction wi

6条回答
  •  逝去的感伤
    2021-02-02 09:58

    using fastparquet you can write a pandas df to parquet either withsnappy or gzip compression as follows:

    make sure you have installed the following:

    $ conda install python-snappy
    $ conda install fastparquet
    

    do imports

    import pandas as pd 
    import snappy
    import fastparquet
    

    assume you have the following pandas df

    df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
    

    send df to parquet with snappy compression

    df.to_parquet('df.snap.parquet',compression='snappy')
    

    send df to parquet with gzip compression

    df.to_parquet('df.gzip.parquet',compression='gzip')
    

    check:

    read parquet back into pandas df

    pd.read_parquet('df.snap.parquet')
    

    or

    pd.read_parquet('df.gzip.parquet')
    

    output:

       col1 col2
    0   1    3
    1   2    4
    

提交回复
热议问题