Trained Machine Learning model is too big

前端 未结 2 1420
小蘑菇
小蘑菇 2020-12-30 10:13

We have trained an Extra Tree model for some regression task. Our model consists of 3 extra trees, each having 200 trees of depth 30. On top of the 3 extra trees, we use a r

相关标签:
2条回答
  • 2020-12-30 10:31

    You can try using joblib with compression parameter.

       from sklearn.externals import joblib
       joblib.dump(your_algo,  'pickle_file_name.pkl',compress=3)
    

    compress - from 0 to 9. Higher value means more compression, but also slower read and write times. Using a value of 3 is often a good compromise.

    You can use python standard compression modules zlib, gzip, bz2, lzma and xz. To use that you can just specify the format with specific extension

    example

    joblib.dump(obj, 'your_filename.pkl.z')   # zlib
    

    More information, see the [link]:(http://gael-varoquaux.info/programming/new_low-overhead_persistence_in_joblib_for_big_data.html)

    0 讨论(0)
  • 2020-12-30 10:36

    In the best case (binary trees), you will have 3 * 200 * (2^30 - 1) = 644245094400 nodes or 434Gb assuming each one node would only cost 1 byte to store. I think that 140GB is a pretty decent size in comparision.

    Edit: Bad maths.

    0 讨论(0)
提交回复
热议问题