It is said that mmap()
maps files to the memory, and it costs to the virtual address space memory of the calling process. Does it really copy data to the memory, or
"virtual memory" of a process is the range of addresses available to it. To make something available in memory, you need to reserve a range of addresses, so mmap()
takes up some virtual memory.
Under Linux (and many other systems probably use similar mechanism), when reading a file, the content is first read into memory allocated by kernel (in Linux this is the "page cache"). Than if you use mmap()
, this memory is simply made available to the process by assigning it some address in that process' address space. If you use read()
, the process allocates a buffer, which needs both addresses (virtual memory) and a place to live (physical memory) and the data get copied from the page cache to that buffer (more physical memory is needed).
The data is only read from disk when actually accessed. In mmap()
case it means when you actually address the memory, in read()
it is the copy to your buffer, so inside the read()
call.
Thus mmap()
is more efficient for large files, especially for random access. The disadvantages are that it can only be used for files and not file-like objects (pipes, sockets, devices, /proc files etc.) and that an IO failure is detected during the page-fault, where they are difficult to handle (it sends SIGBUS signal), while read can return error and the application can try to recover (most don't anyway). The later is mainly concern for network filesystems where IO failure might be because of lost connection.