ndarray.resize: passing the correct value for the refcheck argument

前端未结

关注

 2  1823

Like many others, my situation is that I have a class which collects a large amount of data, and provides a method to return the data as a numpy array. (Additional data can con

相关标签:

2条回答

无人共我

2021-01-22 22:16
I will use array.array() to do the data collection:
```
import array
a = array.array("d")
for i in xrange(100):
    a.append(i*2)
```
Every time when you want to do some calculation with the collected data, convert it to numpy.ndarray by numpy.frombuffer:
```
b = np.frombuffer(a, dtype=float)
print np.mean(b)
```
b will share data memory with a, so the convertion is very fast.
0 讨论(0)
发布评论:

提交评论
- 加载中...
别跟我提以往

2021-01-22 22:28
The resize method has two main problems. The first is that you return a reference to self._arr when the user calls get_data_as_array. Now the resize will do one of two things depending on your implementation. It'll either modify the array you've given you're user ie the user will take a.shape and the shape will unpredictably change. Or it'll corrupt that array, having it point to bad memory. You could solve that issue by always having get_data_as_array return self._arr.copy(), but that brings me to the second issue. resize is acctually not very efficient. I believe in general, resize has to allocate new memory and do a copy every time it is called to grow an array. Plus now you need to copy the array every time you want to return it to your user.

Another approach would be to design your own dynamic array, that would look something like:
```
class DynamicArray(object):

    _data = np.empty(1)
    data = _data[:0]
    len = 0
    scale_factor = 2

    def append(self, values):
        old_data = len(self.data)
        total_data = len(values) + old_data
        total_storage = len(self._data)
        if total_storage < total_data:
            while total_storage < total_data:
                total_storage = np.ceil(total_storage * self.scale_factor)
            self._data = np.empty(total_storage)
            self._data[:old_data] = self.data

        self._data[old_data:total_data] = values
        self.data = self._data[:total_data]
```
This should be very fast because you only need to grow the array log(N) times and you use at most 2*N-1 storage where N is the max size of the array. Other than growing the array, you're just making views of _data which doesn't involve any copying and should be constant time.

Hope this is useful.
0 讨论(0)
发布评论:

提交评论
- 加载中...