I want to save a dict
or arrays.
I try both with np.save
and with pickle
and see that the former always take much less time.
Because as long as the written object contains no Python data,
meanwhile
Note that if a numpy array does contain Python objects, then numpy just pickles the array, and all the win goes out the window.
This is because pickle
works on all sorts of Python objects and is written in pure Python, whereas np.save
is designed for arrays and saves them in an efficient format.
From the numpy.save documentation, it can actually use pickle behind the scenes. This may limit portability between versions of Python and runs the risk of executing arbitrary code (which is a general risk when unpickling an unknown object).
Useful reference: This answer
I think you need better timings. I also disagree with the accepted answer.
b
is a dictionary with 9 keys; the values are lists of arrays. That means both pickle.dump
and np.save
will be using each other - pickle
uses save
to pickle the arrays, save
uses pickle
to save the dictionary and list.
save
writes arrays. That means it has to wrap your dictionary in a object dtype array in order to save it.
In [6]: np.save('test1',b)
In [7]: d=np.load('test1.npy')
In [8]: d
Out[8]:
array({0: [array([0, 0, 0, 0])], 1: [array([1, 0, 0, 0]), array([0, 1, 0, 0]), .... array([ 1, -1, 0, 0]), array([ 1, 0, -1, 0]), array([ 1, 0, 0, -1])]},
dtype=object)
In [9]: d.shape
Out[9]: ()
In [11]: list(d[()].keys())
Out[11]: [0, 1, 2, 3, 4, 5, 6, 7, 8]
Some timings:
In [12]: timeit np.save('test1',b)
850 µs ± 36.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [13]: timeit d=np.load('test1.npy')
566 µs ± 6.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [20]: %%timeit
...: with open('testpickle', 'wb') as myfile:
...: pickle.dump(b, myfile)
...:
505 µs ± 9.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [21]: %%timeit
...: with open('testpickle', 'rb') as myfile:
...: g1 = pickle.load(myfile)
...:
152 µs ± 4.83 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In my timings pickle
is faster.
The pickle file is slightly smaller:
In [23]: ll test1.npy testpickle
-rw-rw-r-- 1 paul 5740 Aug 14 08:40 test1.npy
-rw-rw-r-- 1 paul 4204 Aug 14 08:43 testpickle