I have a program where I basically adjust the probability of certain things happening based on what is already known. My file of data is already saved as a pickle
How about this?
import cPickle as pickle
p = pickle.Pickler(open("temp.p","wb"))
p.fast = True
p.dump(d) # d could be your dictionary or any file
None of the above answers worked for me. I ended up using Hickle which is a drop-in replacement for pickle based on HDF5. Instead of saving it to a pickle it's saving the data to HDF5 file. The API is identical for most use cases and it has some really cool features such as compression.
pip install hickle
Example:
# Create a numpy array of data
array_obj = np.ones(32768, dtype='float32')
# Dump to file
hkl.dump(array_obj, 'test.hkl', mode='w')
# Load data
array_hkl = hkl.load('test.hkl')
I was having the same issue. I use joblib and work was done. In case if someone wants to know other possibilities.
save the model to disk
from sklearn.externals import joblib
filename = 'finalized_model.sav'
joblib.dump(model, filename)
some time later... load the model from disk
loaded_model = joblib.load(filename)
result = loaded_model.score(X_test, Y_test)
print(result)
I recently had this problem. After trying cpickle with ASCII and the binary protocol 2, I found that my SVM from sci-kit learn trained on 20+ gb of data was not pickling due to a memory error. However, the dill package seemed to resolve the issue. Dill will not create many improvements for a dictionary but may help with streaming. It is meant to stream pickled bytes across a network.
import dill
with open(path,'wb') as fp:
dill.dump(outpath,fp)
dill.load(fp)
If efficiency is an issue, try loading/saving to a database. In this instance, your storage solution may be an issue. At 123 mb Pandas should be fine. However, if the machine has limited memory SQL offers fast,optimized, bag operations over data, usually with multithreaded support. My poly kernel svm saved.
This may seem trivial, but try to use the 64bit Python if you are not.
Have you tried using streaming pickle: https://code.google.com/p/streaming-pickle/
I have just solved a similar memory error by switching to streaming pickle.