Why does pickle take so much longer than np.save?

后端 未结 3 792
北恋
北恋 2020-12-22 00:43

I want to save a dict or arrays.

I try both with np.save and with pickle and see that the former always take much less time.

3条回答
  •  礼貌的吻别
    2020-12-22 01:08

    I think you need better timings. I also disagree with the accepted answer.

    b is a dictionary with 9 keys; the values are lists of arrays. That means both pickle.dump and np.save will be using each other - pickle uses save to pickle the arrays, save uses pickle to save the dictionary and list.

    save writes arrays. That means it has to wrap your dictionary in a object dtype array in order to save it.

    In [6]: np.save('test1',b)
    In [7]: d=np.load('test1.npy')
    In [8]: d
    Out[8]: 
    array({0: [array([0, 0, 0, 0])], 1: [array([1, 0, 0, 0]), array([0, 1, 0, 0]), .... array([ 1, -1,  0,  0]), array([ 1,  0, -1,  0]), array([ 1,  0,  0, -1])]},
          dtype=object)
    In [9]: d.shape
    Out[9]: ()
    In [11]: list(d[()].keys())
    Out[11]: [0, 1, 2, 3, 4, 5, 6, 7, 8]
    

    Some timings:

    In [12]: timeit np.save('test1',b)
    850 µs ± 36.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    In [13]: timeit d=np.load('test1.npy')
    566 µs ± 6.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    In [20]: %%timeit 
        ...: with open('testpickle', 'wb') as myfile:
        ...:     pickle.dump(b, myfile)
        ...:     
    505 µs ± 9.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    In [21]: %%timeit 
        ...: with open('testpickle', 'rb') as myfile:
        ...:     g1 = pickle.load(myfile)
        ...:     
    152 µs ± 4.83 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

    In my timings pickle is faster.

    The pickle file is slightly smaller:

    In [23]: ll test1.npy testpickle
    -rw-rw-r-- 1 paul 5740 Aug 14 08:40 test1.npy
    -rw-rw-r-- 1 paul 4204 Aug 14 08:43 testpickle
    

提交回复
热议问题