Pickle alternatives

后端 未结 8 2164
南旧
南旧 2021-02-05 04:16

I am trying to serialize a large (~10**6 rows, each with ~20 values) list, to be used later by myself (so pickle\'s lack of safety isn\'t a concern).

Each row of the lis

相关标签:
8条回答
  • 2021-02-05 04:20

    Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler.

    advantages over XML:

    • are simpler
    • are 3 to 10 times smaller
    • are 20 to 100 times faster
    • are less ambiguous
    • generate data access classes that are easier to use programmatically

    https://developers.google.com/protocol-buffers/docs/pythontutorial

    0 讨论(0)
  • 2021-02-05 04:26

    I usually serialize to plain text (*.csv) because I found it to be fastest. The csv module works quite well. See http://docs.python.org/library/csv.html

    If you have to deal with unicode for your strings, check out the UnicodeReader and UnicodeWriter examples at the end.

    If you serialize for your own future use, I guess it would suffice to know that you have the same data type per csv column (e.g., string are always on column 2).

    0 讨论(0)
  • 2021-02-05 04:28

    Avro seems to be promising and properly designed but yet non popular solution.

    0 讨论(0)
  • 2021-02-05 04:35

    Pickle is actually quite fast so long as you aren't using the (default) ASCII protocol. Just make sure to dump using protocol=pickle.HIGHEST_PROTOCOL.

    0 讨论(0)
  • 2021-02-05 04:39

    Just for the sake of completeness - there is also dill library that extends pickle.

    How to dill (pickle) to file?

    0 讨论(0)
  • 2021-02-05 04:43
    • Protocol Buffer - e.g. used in Caffe; maintains type information, but you have to put quite much effort in it compared to pickle
    • MessagePack: See python package - supports streaming (source)
    • BSON: see python package docs
    0 讨论(0)
提交回复
热议问题