Resizing numpy.memmap arrays

后端 未结 2 1791
眼角桃花
眼角桃花 2021-02-05 06:02

I\'m working with a bunch of large numpy arrays, and as these started to chew up too much memory lately, I wanted to replace them with numpy.memmap instances. The p

2条回答
  •  不思量自难忘°
    2021-02-05 06:29

    If I'm not mistaken, this achieves essentially what @wwwslinger's second solution does, but without having to manually specify the size of the new memmap in bits:

    In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,))
    
    In [2]: a[3] = 7
    
    In [3]: a
    Out[3]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
    
    In [4]: a.flush()
    
    # this will append to the original file as much as is necessary to satisfy
    # the new shape requirement, given the specified dtype
    In [5]: new_a = np.memmap('bla.bin', mode='r+', dtype=int, shape=(20,))
    
    In [6]: new_a
    Out[6]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
    
    In [7]: a[-1] = 10
    
    In [8]: a
    Out[8]: memmap([ 0,  0,  0,  7,  0,  0,  0,  0,  0, 10])
    
    In [9]: a.flush()
    
    In [11]: new_a
    Out[11]: 
    memmap([ 0,  0,  0,  7,  0,  0,  0,  0,  0, 10,  0,  0,  0,  0,  0,  0,  0,
             0,  0,  0])
    

    This works well when the new array needs to be bigger than the old one, but I don't think this type of approach will allow for the size of the memory-mapped file to be automatically truncated if the new array is smaller.

    Manually resizing the base, as in @wwwslinger's answer, seems to allow the file to be truncated, but it doesn't reduce the size of the array.

    For example:

    # this creates a memory mapped file of 10 * 8 = 80 bytes
    In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,))
    
    In [2]: a[:] = range(1, 11)
    
    In [3]: a.flush()
    
    In [4]: a
    Out[4]: memmap([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
    
    # now truncate the file to 40 bytes
    In [5]: a.base.resize(5*8)
    
    In [6]: a.flush()
    
    # the array still has the same shape, but the truncated part is all zeros
    In [7]: a
    Out[7]: memmap([1, 2, 3, 4, 5, 0, 0, 0, 0, 0])
    
    In [8]: b = np.memmap('bla.bin', mode='r+', dtype=int, shape=(5,))
    
    # you still need to create a new np.memmap to change the size of the array
    In [9]: b
    Out[9]: memmap([1, 2, 3, 4, 5])
    

提交回复
热议问题