What are the pitfalls of using Dill to serialise scikit-learn/statsmodels models?

前端未结

关注

 3  429

灰色年华 2021-01-31 09:24

I need to serialise scikit-learn/statsmodels models such that all the dependencies (code + data) are packaged in an artefact and this artefact can be used to initialise the mode

3条回答

深忆病人 (楼主)

2021-01-31 09:56

I package gaussian process (GP) from scikit-learn using pickle.

The primary reason is because the GP takes long time to build and loads much faster using pickle. So in my code initialization I check whether the data files for model got updated and re-generate the model if necessary, otherwise just de-serialize it from pickle!

I would use pickle, dill, cloudpickle in the respective order.

Note that pickle includes protocol keyword argument and some values can speed up and reduce memory usage significantly! Finally I wrap pickle code with compression from CPython STL if necessary.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...