ndarray.resize: passing the correct value for the refcheck argument

前端 未结 2 1823
逝去的感伤
逝去的感伤 2021-01-22 21:58

Like many others, my situation is that I have a class which collects a large amount of data, and provides a method to return the data as a numpy array. (Additional data can con

相关标签:
2条回答
  • 2021-01-22 22:16

    I will use array.array() to do the data collection:

    import array
    a = array.array("d")
    for i in xrange(100):
        a.append(i*2)
    

    Every time when you want to do some calculation with the collected data, convert it to numpy.ndarray by numpy.frombuffer:

    b = np.frombuffer(a, dtype=float)
    print np.mean(b)
    

    b will share data memory with a, so the convertion is very fast.

    0 讨论(0)
  • 2021-01-22 22:28

    The resize method has two main problems. The first is that you return a reference to self._arr when the user calls get_data_as_array. Now the resize will do one of two things depending on your implementation. It'll either modify the array you've given you're user ie the user will take a.shape and the shape will unpredictably change. Or it'll corrupt that array, having it point to bad memory. You could solve that issue by always having get_data_as_array return self._arr.copy(), but that brings me to the second issue. resize is acctually not very efficient. I believe in general, resize has to allocate new memory and do a copy every time it is called to grow an array. Plus now you need to copy the array every time you want to return it to your user.

    Another approach would be to design your own dynamic array, that would look something like:

    class DynamicArray(object):
    
        _data = np.empty(1)
        data = _data[:0]
        len = 0
        scale_factor = 2
    
        def append(self, values):
            old_data = len(self.data)
            total_data = len(values) + old_data
            total_storage = len(self._data)
            if total_storage < total_data:
                while total_storage < total_data:
                    total_storage = np.ceil(total_storage * self.scale_factor)
                self._data = np.empty(total_storage)
                self._data[:old_data] = self.data
    
            self._data[old_data:total_data] = values
            self.data = self._data[:total_data]
    

    This should be very fast because you only need to grow the array log(N) times and you use at most 2*N-1 storage where N is the max size of the array. Other than growing the array, you're just making views of _data which doesn't involve any copying and should be constant time.

    Hope this is useful.

    0 讨论(0)
提交回复
热议问题