It is said that mmap()
maps files to the memory, and it costs to the virtual address space memory of the calling process. Does it really copy data to the memory, or
The only thing the mmap
function really does is change some kernel data structures, and possibly the page table. It doesn't actually put anything into physical memory at all. After you call mmap
, the allocated region probably doesn't even point to physical memory: accessing it will cause a page fault. This kind of page fault is transparently handled by the kernel, in fact, this is one of the kernel's primary duties.
What happens with mmap
is that the data remains on disk, and it is copied from disk to memory as your process reads it. It can also be copied to physical memory speculatively. When your process gets swapped out, the pages in the mmap
region do not have to be written to swap because they are already backed by long-term storage -- unless you have modified them, of course.
However, mmap
will consume virtual address space, just like malloc
and other similar functions (which mostly use mmap
behind the scenes, or sbrk
, which is basically a special version of mmap
). The main difference between using mmap
to read a file and read
to read a file is that unmodified pages in an mmap
region do not contribute to overall memory pressure, they are almost "free", memory wise, as long as they are not being used. In contrast, files read with the read
function will always contribute to memory pressure whether they are being used or not, and whether they have been modified or not.
Finally, mmap
is faster than read
only in the use cases which it favors -- random access and page reuse. For linearly traversing a file, especially a small file, read
will generally be faster since it does not require modifying the page tables, and it takes fewer system calls.
As a recommendation, I can say that any large file which you will be scanning through should generally be read in its entirety with mmap
on 64-bit systems, and you can mmap
it in chunks on 32-bit systems where virtual memory is less available.
See also: mmap() vs. reading blocks
See also (thanks to James): When should I use mmap for file access?