Save Numpy Array using Pickle

后端 未结 6 1928
独厮守ぢ
独厮守ぢ 2021-02-18 20:59

I\'ve got a Numpy array that I would like to save (130,000 x 3) that I would like to save using Pickle, with the following code. However, I keep getting the error \"EOFError: Ra

6条回答
  •  傲寒
    傲寒 (楼主)
    2021-02-18 21:25

    Don't use pickle for numpy arrays, for an extended discussion that links to all resources I could find see my answer here.

    Short reasons:

    • there is already a nice interface the developers of numpy made and will save you lots of time of debugging (most important reason)
    • np.save,np.load,np.savez have pretty good performance in most metrics, see this, which is to be expected since it's an established library and the developers of numpy made those functions.
    • Pickle executes arbitrary code and is a security issue
    • to use pickle you would have to open and file and might get issues that leads to bugs (e.g. I wasn't aware of using b and it stopped working, took time to debug)
    • if you refuse to accept this advice, at least really articulate the reason you need to use something else. Make sure it's crystal clear in your head.

    Avoid repeating code at all costs if a solution already exists!

    Anyway, here are all the interfaces I tried, hopefully it saves someone time (probably my future self):

    import numpy as np
    import pickle
    from pathlib import Path
    
    path = Path('~/data/tmp/').expanduser()
    path.mkdir(parents=True, exist_ok=True)
    
    lb,ub = -1,1
    num_samples = 5
    x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
    y = x**2 + x + 2
    
    # using save (to npy), savez (to npz)
    np.save(path/'x', x)
    np.save(path/'y', y)
    np.savez(path/'db', x=x, y=y)
    with open(path/'db.pkl', 'wb') as db_file:
        pickle.dump(obj={'x':x, 'y':y}, file=db_file)
    
    ## using loading npy, npz files
    x_loaded = np.load(path/'x.npy')
    y_load = np.load(path/'y.npy')
    db = np.load(path/'db.npz')
    with open(path/'db.pkl', 'rb') as db_file:
        db_pkl = pickle.load(db_file)
    
    print(x is x_loaded)
    print(x == x_loaded)
    print(x == db['x'])
    print(x == db_pkl['x'])
    print('done')
    

    but most useful see my answer here.

提交回复
热议问题