Why does pickle take so much longer than np.save?

后端未结

关注

 3  792

北恋 2020-12-22 00:43

I want to save a dict or arrays.

I try both with np.save and with pickle and see that the former always take much less time.

3条回答

礼貌的吻别 (楼主)

2020-12-22 01:08

I think you need better timings. I also disagree with the accepted answer.

b is a dictionary with 9 keys; the values are lists of arrays. That means both pickle.dump and np.save will be using each other - pickle uses save to pickle the arrays, save uses pickle to save the dictionary and list.

save writes arrays. That means it has to wrap your dictionary in a object dtype array in order to save it.

In [6]: np.save('test1',b)
In [7]: d=np.load('test1.npy')
In [8]: d
Out[8]: 
array({0: [array([0, 0, 0, 0])], 1: [array([1, 0, 0, 0]), array([0, 1, 0, 0]), .... array([ 1, -1,  0,  0]), array([ 1,  0, -1,  0]), array([ 1,  0,  0, -1])]},
      dtype=object)
In [9]: d.shape
Out[9]: ()
In [11]: list(d[()].keys())
Out[11]: [0, 1, 2, 3, 4, 5, 6, 7, 8]

Some timings:

In [12]: timeit np.save('test1',b)
850 µs ± 36.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [13]: timeit d=np.load('test1.npy')
566 µs ± 6.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [20]: %%timeit 
    ...: with open('testpickle', 'wb') as myfile:
    ...:     pickle.dump(b, myfile)
    ...:     
505 µs ± 9.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [21]: %%timeit 
    ...: with open('testpickle', 'rb') as myfile:
    ...:     g1 = pickle.load(myfile)
    ...:     
152 µs ± 4.83 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In my timings pickle is faster.

The pickle file is slightly smaller:

In [23]: ll test1.npy testpickle
-rw-rw-r-- 1 paul 5740 Aug 14 08:40 test1.npy
-rw-rw-r-- 1 paul 4204 Aug 14 08:43 testpickle

0 讨论(0)

查看其它3个回答