Why is there a large overhead in pickling numpy arrays?

左心房为你撑大大i 提交于 2021-01-28 04:19:25

问题


Suppose I have a simple array in Python:

>>> x = [1.0, 2.0, 3.0, 4.0]

When pickled, it is a reasonably small size:

>>> pickle.dumps(x).__len__()
44

How come if I use a numpy array, the size is so much larger?

>>> xn = np.array(x)
>>> pickle.dumps(xn).__len__()
187

Converting it to a less precise data type only helps a little bit...

>>> x16 = xn.astype('float16')
>>> pickle.dumps(x16).__len__()
163

Other numpy/scipy data structures like sparse matrices also don't pickle well. Why?


回答1:


Checking it in a debugger, a numpy array has the fields like max, min, type etc apart from the data, which I am not sure a python list has.

A complete list can be found on http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html

As pickling is just a binary copying, these other fields are also being copied, resulting in a larger size.



来源:https://stackoverflow.com/questions/31304006/why-is-there-a-large-overhead-in-pickling-numpy-arrays

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!