Resizing numpy.memmap arrays

后端 未结 2 1795
眼角桃花
眼角桃花 2021-02-05 06:02

I\'m working with a bunch of large numpy arrays, and as these started to chew up too much memory lately, I wanted to replace them with numpy.memmap instances. The p

相关标签:
2条回答
  • 2021-02-05 06:27

    The issue is that the flag OWNDATA is False when you create your array. You can change that by requiring the flag to be True when you create the array:

    >>> a = np.require(np.memmap('bla.bin', dtype=int), requirements=['O'])
    >>> a.shape
    (10,)
    >>> a.flags
      C_CONTIGUOUS : True
      F_CONTIGUOUS : True
      OWNDATA : True
      WRITEABLE : True
      ALIGNED : True
      UPDATEIFCOPY : False
    >>> a.resize(20, refcheck=False)
    >>> a.shape
    (20,)
    

    The only caveat is that it may create the array and make a copy to be sure the requirements are met.

    Edit to address saving:

    If you want to save the re-sized array to disk, you can save the memmap as a .npy formatted file and open as a numpy.memmap when you need to re-open it and use as a memmap:

    >>> a[9] = 1
    >>> np.save('bla.npy',a)
    >>> b = np.lib.format.open_memmap('bla.npy', dtype=int, mode='r+')
    >>> b
    memmap([0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
    

    Edit to offer another method:

    You may get close to what you're looking for by re-sizing the base mmap (a.base or a._mmap, stored in uint8 format) and "reloading" the memmap:

    >>> a = np.memmap('bla.bin', dtype=int)
    >>> a
    memmap([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
    >>> a[3] = 7
    >>> a
    memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
    >>> a.flush()
    >>> a = np.memmap('bla.bin', dtype=int)
    >>> a
    memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
    >>> a.base.resize(20*8)
    >>> a.flush()
    >>> a = np.memmap('bla.bin', dtype=int)
    >>> a
    memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
    
    0 讨论(0)
  • 2021-02-05 06:29

    If I'm not mistaken, this achieves essentially what @wwwslinger's second solution does, but without having to manually specify the size of the new memmap in bits:

    In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,))
    
    In [2]: a[3] = 7
    
    In [3]: a
    Out[3]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0])
    
    In [4]: a.flush()
    
    # this will append to the original file as much as is necessary to satisfy
    # the new shape requirement, given the specified dtype
    In [5]: new_a = np.memmap('bla.bin', mode='r+', dtype=int, shape=(20,))
    
    In [6]: new_a
    Out[6]: memmap([0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
    
    In [7]: a[-1] = 10
    
    In [8]: a
    Out[8]: memmap([ 0,  0,  0,  7,  0,  0,  0,  0,  0, 10])
    
    In [9]: a.flush()
    
    In [11]: new_a
    Out[11]: 
    memmap([ 0,  0,  0,  7,  0,  0,  0,  0,  0, 10,  0,  0,  0,  0,  0,  0,  0,
             0,  0,  0])
    

    This works well when the new array needs to be bigger than the old one, but I don't think this type of approach will allow for the size of the memory-mapped file to be automatically truncated if the new array is smaller.

    Manually resizing the base, as in @wwwslinger's answer, seems to allow the file to be truncated, but it doesn't reduce the size of the array.

    For example:

    # this creates a memory mapped file of 10 * 8 = 80 bytes
    In [1]: a = np.memmap('bla.bin', mode='w+', dtype=int, shape=(10,))
    
    In [2]: a[:] = range(1, 11)
    
    In [3]: a.flush()
    
    In [4]: a
    Out[4]: memmap([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
    
    # now truncate the file to 40 bytes
    In [5]: a.base.resize(5*8)
    
    In [6]: a.flush()
    
    # the array still has the same shape, but the truncated part is all zeros
    In [7]: a
    Out[7]: memmap([1, 2, 3, 4, 5, 0, 0, 0, 0, 0])
    
    In [8]: b = np.memmap('bla.bin', mode='r+', dtype=int, shape=(5,))
    
    # you still need to create a new np.memmap to change the size of the array
    In [9]: b
    Out[9]: memmap([1, 2, 3, 4, 5])
    
    0 讨论(0)
提交回复
热议问题