I am trying to serialize a large (~10**6 rows, each with ~20 values) list, to be used later by myself (so pickle\'s lack of safety isn\'t a concern).
Each row of the lis
I think you should give PyTables a look. It should be ridiculously fast, at least faster than using an RDBMS, since it's very lax and doesn't impose any read/write restrictions, plus you get a better interface for managing your data, at least compared to pickling it.
For hundreds of thousands of simple (up to JSON-compatible) complexity Python objects, I've found the best combination of simplicity, speed, and size by combining:
It beats pickle
and cPickle
options by orders of magnitude.
with gzip.open(filename, 'wb') as f:
ubjson.dump(items, f)
with gzip.open(filename, 'rb') as f:
return ubjson.load(f)