mmap() vs. reading blocks

前端 未结 12 1411
醉酒成梦
醉酒成梦 2020-11-22 16:59

I\'m working on a program that will be processing files that could potentially be 100GB or more in size. The files contain sets of variable length records. I\'ve got a first

12条回答
  •  名媛妹妹
    2020-11-22 17:37

    The main performance cost is going to be disk i/o. "mmap()" is certainly quicker than istream, but the difference might not be noticeable because the disk i/o will dominate your run-times.

    I tried Ben Collins's code fragment (see above/below) to test his assertion that "mmap() is way faster" and found no measurable difference. See my comments on his answer.

    I would certainly not recommend separately mmap'ing each record in turn unless your "records" are huge - that would be horribly slow, requiring 2 system calls for each record and possibly losing the page out of the disk-memory cache.....

    In your case I think mmap(), istream and the low-level open()/read() calls will all be about the same. I would recommend mmap() in these cases:

    1. There is random access (not sequential) within the file, AND
    2. the whole thing fits comfortably in memory OR there is locality-of-reference within the file so that certain pages can be mapped in and other pages mapped out. That way the operating system uses the available RAM to maximum benefit.
    3. OR if multiple processes are reading/working on the same file, then mmap() is fantastic because the processes all share the same physical pages.

    (btw - I love mmap()/MapViewOfFile()).

提交回复
热议问题