I am trying to serialize a large (~10**6 rows, each with ~20 values) list, to be used later by myself (so pickle\'s lack of safety isn\'t a concern).
Each row of the lis
Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler.
advantages over XML:
- are simpler
- are 3 to 10 times smaller
- are 20 to 100 times faster
- are less ambiguous
- generate data access classes that are easier to use programmatically
https://developers.google.com/protocol-buffers/docs/pythontutorial
I usually serialize to plain text (*.csv) because I found it to be fastest. The csv module works quite well. See http://docs.python.org/library/csv.html
If you have to deal with unicode for your strings, check out the UnicodeReader and UnicodeWriter examples at the end.
If you serialize for your own future use, I guess it would suffice to know that you have the same data type per csv column (e.g., string are always on column 2).
Avro seems to be promising and properly designed but yet non popular solution.
Pickle is actually quite fast so long as you aren't using the (default) ASCII protocol. Just make sure to dump using protocol=pickle.HIGHEST_PROTOCOL
.
Just for the sake of completeness - there is also dill
library that extends pickle
.
How to dill (pickle) to file?