I need to serialise scikit-learn/statsmodels models such that all the dependencies (code + data) are packaged in an artefact and this artefact can be used to initialise the mode
I package gaussian process (GP) from scikit-learn
using pickle
.
The primary reason is because the GP takes long time to build and loads much faster using pickle
. So in my code initialization I check whether the data files for model got updated and re-generate the model if necessary, otherwise just de-serialize it from pickle
!
I would use pickle
, dill
, cloudpickle
in the respective order.
Note that pickle
includes protocol
keyword argument and some values can speed up and reduce memory usage significantly!
Finally I wrap pickle code with compression from CPython STL if necessary.