Pickle dump huge file without memory error

后端 未结 9 855
梦谈多话
梦谈多话 2020-12-23 14:26

I have a program where I basically adjust the probability of certain things happening based on what is already known. My file of data is already saved as a pickle

相关标签:
9条回答
  • 2020-12-23 14:42

    How about this?

    import cPickle as pickle
    p = pickle.Pickler(open("temp.p","wb")) 
    p.fast = True 
    p.dump(d) # d could be your dictionary or any file
    
    0 讨论(0)
  • 2020-12-23 14:43

    None of the above answers worked for me. I ended up using Hickle which is a drop-in replacement for pickle based on HDF5. Instead of saving it to a pickle it's saving the data to HDF5 file. The API is identical for most use cases and it has some really cool features such as compression.

    pip install hickle
    

    Example:

    # Create a numpy array of data
    array_obj = np.ones(32768, dtype='float32')
    
    # Dump to file
    hkl.dump(array_obj, 'test.hkl', mode='w')
    
    # Load data
    array_hkl = hkl.load('test.hkl')
    
    0 讨论(0)
  • 2020-12-23 14:47

    I was having the same issue. I use joblib and work was done. In case if someone wants to know other possibilities.

    save the model to disk

    from sklearn.externals import joblib
    filename = 'finalized_model.sav'
    joblib.dump(model, filename)  
    

    some time later... load the model from disk

    loaded_model = joblib.load(filename)
    result = loaded_model.score(X_test, Y_test) 
    
    print(result)
    
    0 讨论(0)
  • 2020-12-23 14:48

    I recently had this problem. After trying cpickle with ASCII and the binary protocol 2, I found that my SVM from sci-kit learn trained on 20+ gb of data was not pickling due to a memory error. However, the dill package seemed to resolve the issue. Dill will not create many improvements for a dictionary but may help with streaming. It is meant to stream pickled bytes across a network.

    import dill
    
    with open(path,'wb') as fp:
        dill.dump(outpath,fp)
        dill.load(fp)
    

    If efficiency is an issue, try loading/saving to a database. In this instance, your storage solution may be an issue. At 123 mb Pandas should be fine. However, if the machine has limited memory SQL offers fast,optimized, bag operations over data, usually with multithreaded support. My poly kernel svm saved.

    0 讨论(0)
  • 2020-12-23 14:51

    This may seem trivial, but try to use the 64bit Python if you are not.

    0 讨论(0)
  • 2020-12-23 14:52

    Have you tried using streaming pickle: https://code.google.com/p/streaming-pickle/

    I have just solved a similar memory error by switching to streaming pickle.

    0 讨论(0)
提交回复
热议问题