Methods for writing Parquet files using Python?

后端 未结 6 722
再見小時候
再見小時候 2021-02-02 09:30

I\'m having trouble finding a library that allows Parquet files to be written using Python. Bonus points if I can use Snappy or a similar compression mechanism in conjunction wi

6条回答
  •  灰色年华
    2021-02-02 10:06

    Update (March 2017): There are currently 2 libraries capable of writing Parquet files:

    1. fastparquet
    2. pyarrow

    Both of them are still under heavy development it seems and they come with a number of disclaimers (no support for nested data e.g.), so you will have to check whether they support everything you need.

    OLD ANSWER:

    As of 2.2016 there seems to be NO python-only library capable of writing Parquet files.

    If you only need to read Parquet files there is python-parquet.

    As a workaround you will have to rely on some other process like e.g. pyspark.sql (which uses Py4J and runs on the JVM and can thus not be used directly from your average CPython program).

提交回复
热议问题