I\'ve got a Numpy array that I would like to save (130,000 x 3) that I would like to save using Pickle, with the following code. However, I keep getting the error \"EOFError: Ra
Don't use pickle for numpy arrays, for an extended discussion that links to all resources I could find see my answer here.
Short reasons:
np.save,np.load,np.savez
have pretty good performance in most metrics, see this, which is to be expected since it's an established library and the developers of numpy made those functions.b
and it stopped working, took time to debug)Avoid repeating code at all costs if a solution already exists!
Anyway, here are all the interfaces I tried, hopefully it saves someone time (probably my future self):
import numpy as np
import pickle
from pathlib import Path
path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)
lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2
# using save (to npy), savez (to npz)
np.save(path/'x', x)
np.save(path/'y', y)
np.savez(path/'db', x=x, y=y)
with open(path/'db.pkl', 'wb') as db_file:
pickle.dump(obj={'x':x, 'y':y}, file=db_file)
## using loading npy, npz files
x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')
db = np.load(path/'db.npz')
with open(path/'db.pkl', 'rb') as db_file:
db_pkl = pickle.load(db_file)
print(x is x_loaded)
print(x == x_loaded)
print(x == db['x'])
print(x == db_pkl['x'])
print('done')
but most useful see my answer here.