[Edit: This problem applies only to 32-bit systems. If your computer, your OS and your python implementation are 64-bit, then mmap-ing huge files works reliably and
From IEEE 1003.1:
The mmap() function shall establish a mapping between a process' address space and a file, shared memory object, or [TYM] typed memory object.
It needs all the virtual address space because that's exactly what mmap()
does.
The fact that it isn't really running out of memory doesn't matter - you can't map more address space than you have available. Since you then take the result and access as if it were memory, how exactly do you propose to access more than 2^32 bytes into the file? Even if mmap()
didn't fail, you could still only read the first 4GB before you ran out of space in a 32-bit address space. You can, of course, mmap()
a sliding 32-bit window over the file, but that won't necessarily net you any benefit unless you can optimize your access pattern such that you limit how many times you have to visit previous windows.
You ask the OS to map the entire file in a memory range. It won't be read until you trigger page faults by reading/writing, but it still needs to make sure the entire range is available to your process, and if that range is too big, there will be difficulties.