We have trained an Extra Tree model for some regression task. Our model consists of 3 extra trees, each having 200 trees of depth 30. On top of the 3 extra trees, we use a r
You can try using joblib with compression parameter.
from sklearn.externals import joblib
joblib.dump(your_algo, 'pickle_file_name.pkl',compress=3)
compress - from 0 to 9. Higher value means more compression, but also slower read and write times. Using a value of 3 is often a good compromise.
You can use python standard compression modules zlib, gzip, bz2, lzma and xz. To use that you can just specify the format with specific extension
example
joblib.dump(obj, 'your_filename.pkl.z') # zlib
More information, see the [link]:(http://gael-varoquaux.info/programming/new_low-overhead_persistence_in_joblib_for_big_data.html)
In the best case (binary trees), you will have 3 * 200 * (2^30 - 1) = 644245094400
nodes or 434Gb
assuming each one node would only cost 1 byte to store. I think that 140GB is a pretty decent size in comparision.
Edit: Bad maths.