Trained Machine Learning model is too big

前端未结

关注

 2  1420

We have trained an Extra Tree model for some regression task. Our model consists of 3 extra trees, each having 200 trees of depth 30. On top of the 3 extra trees, we use a r

相关标签:

2条回答

佛祖请我去吃肉

2020-12-30 10:31
You can try using joblib with compression parameter.
```
   from sklearn.externals import joblib
   joblib.dump(your_algo,  'pickle_file_name.pkl',compress=3)
```
compress - from 0 to 9. Higher value means more compression, but also slower read and write times. Using a value of 3 is often a good compromise.

You can use python standard compression modules zlib, gzip, bz2, lzma and xz. To use that you can just specify the format with specific extension

example
```
joblib.dump(obj, 'your_filename.pkl.z')   # zlib
```
More information, see the [link]:(http://gael-varoquaux.info/programming/new_low-overhead_persistence_in_joblib_for_big_data.html)
0 讨论(0)
发布评论:

提交评论
- 加载中...
被撕碎了的回忆

2020-12-30 10:36

In the best case (binary trees), you will have 3 * 200 * (2^30 - 1) = 644245094400 nodes or 434Gb assuming each one node would only cost 1 byte to store. I think that 140GB is a pretty decent size in comparision.

Edit: Bad maths.

0 讨论(0)
发布评论:

提交评论
- 加载中...