Why can a 352GB NumPy ndarray be used on an 8GB memory macOS computer?

后端 未结 1 1362
名媛妹妹
名媛妹妹 2020-12-03 04:37
import numpy as np

array = np.zeros((210000, 210000)) # default numpy.float64
array.nbytes

When I run the above code on my 8GB memory MacBook with

相关标签:
1条回答
  • 2020-12-03 05:18

    @Martijn Pieters' answer is on the right track, but not quite right: this has nothing to do with memory compression, but instead it has to do with virtual memory.

    For example, try running the following code on your machine:

    arrays = [np.zeros((21000, 21000)) for _ in range(0, 10000)]
    

    This code allocates 32TiB of memory, but you won't get an error (at least I didn't, on Linux). If I check htop, I see the following:

      PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
    31362 user       20   0 32.1T 69216 12712 S  0.0  0.4  0:00.22 python
    

    This because the OS is perfectly willing to overcommit on virtual memory. It won't actually assign pages to physical memory until it needs to. The way it works is:

    • calloc asks the OS for some memory to use
    • the OS looks in the process's page tables, and finds a chunk of memory that it's willing to assign. This is fast operation, the OS just stores the memory address range in an internal data structure.
    • the program writes to one of the addresses.
    • the OS receives a page fault, at which point it looks and actually assigns the page to physical memory. A page is usually a few KiB in size.
    • the OS passes control back to the program, which proceeds without noticing the interruption.

    Creating a single huge array doesn't work on Linux because, by default, a "heuristic algorithm is applied to figure out if enough memory is available". (thanks @Martijn Pieters!) Some experiments on my system show that for me, the kernel is unwilling to provide more than 0x3BAFFFFFF bytes. However, if I run echo 1 | sudo tee /proc/sys/vm/overcommit_memory, and then try the program in the OP again, it works fine.

    For fun, try running arrays = [np.ones((21000, 21000)) for _ in range(0, 10000)]. You'll definitely get an out of memory error, even on MacOs or Linux with swap compression. Yes, certain OSes can compress RAM, but they can't compress it to the level that you wouldn't run out of memory.

    0 讨论(0)
提交回复
热议问题