can't create a file larger than 2GB on 64 bit linux system with mmap/malloc/open etc

拥有回忆 提交于 2019-12-24 08:58:41

问题


OK I know questions like this have been asked in various forms before and I have read them all and tried everything that has been suggested but I still cannot create a file that is more than 2GB on a 64bit system using malloc, open, lseek, blah blah every trick under the sun.

Clearly I'm writing c here. I'm running Fedora 20, I'm actually trying to mmap the file but that is not where it fails, my original method was to use open(), then lseek to the position where the file should end which in this case is at 3GB, edit: and then write a byte at the file end position to actually create the file of that size, and then mmap the file. I cannot lseek to past 2GB. I cannot malloc more than 2GB either. ulimit -a etc all show unlimited, /etc/security/limits.conf shows nothing, ....

when I try to lseek past 2GB I get EINVAL for errno and the ret val of lseek is -1.edit: The size parameter to lseek is of type off_t which is defined as a long int (64bit signed), not size_t as I said previously.

edit: I've already tried defining _LARGEFILE64_SOURCE & _FILE_OFFSET_BITS 64 and it made no difference. I'm also compiling specifically for 64bit i.e. -m64

I'm lost. I have no idea why I cant do this.

Any help would be greatly appreciated.

Thanks.

edit: I've removed a lot of completely incorrect babbling on my part and some other unimportant ramblings that have been dealt with later on.

My 2GB problem was in the horribly sloppy interchanging of multiple different types. Mixing of signed and unsigned being the problem. Essentially the 3GB position I was passing to lseek was being interpreted/turned into a position of -1GB and clearly lseek didnt like that. So my bad. Totally stupid.

I am going to change to using posix_fallocate() as p_l suggested. While it does remove one function call i.e. only need posix_fallocate instead of an lseek and then a write, for me that isn't significant, it is the fact that posix_fallocate is doing exactly what I want directly which the lseek method doesn't. So thanks in particular to p_l for suggesting that, and a special thanks to NominalAnimal whose persistence that he knew better indirectly lead me to the realisation that I cant count which in turn led me to accept that posix_fallocate would work and so change to using it.

Regardless of the end method I used. The problem of 2GB was entirely my own crap coding and thanks again to EOF, chux, p_l and Jonathon Leffler who all contributed information and suggestions that lead me to the problem I had created for myself.

I've included a shorter version of this in an answer.


回答1:


My 2GB problem was in the horribly sloppy interchanging of multiple different types. Mixing of signed and unsigned being the problem. Essentially the 3GB position I was passing to lseek was being interpreted/turned into a position of -1GB and clearly lseek didnt like that. So my bad. Totally stupid crap coding.

Thanks again to EOF, chux, p_l and Jonathon Leffler who all contributed information and suggestions that lead me to the problem I'd created and its solution.

Thanks again to p_l for suggesting posix_fallocate(), and a special thanks to NominalAnimal whose persistence that he knew better indirectly lead me to the realisation that I cant count which in turn led me to accept that posix_fallocate would work and so change to using it.

@p_l although the solution to my actual problem wasn't in your answer, I'd still up vote your answer that suggested using posix_fallocate but I dont have enough points to do that.




回答2:


First of all, try:

//Before any includes:
#define  _LARGEFILE64_SOURCE
#define  _FILE_OFFSET_BITS 64

If that doesn't work, change lseek to lseek64 like this

lseek64(fd, 3221225472, SEEK_SET);

A better option than lseek might be posix_fallocate():

posix_fallocate(fd, 0, 3221225472);

before the call to mmap();

I recommend keeping the defines, though :)




回答3:


This is a test program I created (a2b.c):

#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <inttypes.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <unistd.h>

static void err_exit(const char *fmt, ...);

int main(void)
{
    char const filename[] = "big.file";
    int fd = open(filename, O_RDONLY);
    if (fd < 0)
        err_exit("Failed to open file %s for reading", filename);
    struct stat sb;
    fstat(fd, &sb);
    uint64_t size = sb.st_size;
    printf("File: %s; size %" PRIu64 "\n", filename, size);
    assert(size > UINT64_C(3) * 1024 * 1024 * 1024);
    off_t offset = UINT64_C(3) * 1024 * 1024 * 1024;
    if (lseek(fd, offset, SEEK_SET) < 0)
        err_exit("lseek failed");
    close(fd);
    _Static_assert(sizeof(size_t) > 4, "sizeof(size_t) is too small");
    size = UINT64_C(3) * 1024 * 1024 * 1024;
    void *space = malloc(size);
    if (space == 0)
        err_exit("failed to malloc %zu bytes", size);
    *((char *)space + size - 1) = '\xFF';
    printf("All OK\n");
    return 0;
}

static void err_exit(const char *fmt, ...)
{
    int errnum = errno;
    va_list args;
    va_start(args, fmt);
    vfprintf(stderr, fmt, args);
    va_end(args);
    if (errnum != 0)
        fprintf(stderr, ": (%d) %s", errnum, strerror(errnum));
    putc('\n', stderr);
    exit(1);
}

When compiled and run on a Mac (Mac OS X 10.9.2 Mavericks, GCC 4.8.2, 16 GiB physical RAM), with command line:

gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
    -Wold-style-definition -Werror a2b.c -o a2b

and having created big.file with:

dd if=/dev/zero of=big.file bs=1048576 count=5000

I got the reassuring output:

File: big.file; size 5242880000
All OK

I had to use _Static_assert rather than static_assert because the Mac <assert.h> header doesn't define static_assert. When I compiled with -m32, the static assert triggered.

When I ran it on an Ubuntu 13.10 64-bit VM with 1 GiB virtual physical memory (or is that tautological?), I not very surprisingly got the output:

File: big.file; size 5242880000
failed to malloc 3221225472 bytes: (12) Cannot allocate memory

I used exactly the same command line to compile the code; it compiled OK on Linux with static_assert in place of _Static_assert. The output of ulimit -a indicated that the maximum memory size was unlimited, but that means 'no limit smaller than that imposed by the amount of virtual memory on the machine' rather than anything bigger.

Note that my compilations did not explicitly include -m64 but they were automatically 64-bit compilations.

What do you get? Can dd create the big file? Does the code compile? (If you don't have C11 support in your compiler, then you'll need to replace the static assert with a normal 'dynamic' assert, removing the error message.) Does the code run? What result do you get.




回答4:


Here is an example program, example.c:

/* Not required on 64-bit architectures; recommended anyway. */
#define  _FILE_OFFSET_BITS 64

/* Tell the compiler we do need POSIX.1-2001 features. */
#define  _POSIX_C_SOURCE 200112L

/* Needed to get MAP_NORESERVE. */
#define  _GNU_SOURCE

#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <errno.h>

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

#ifndef   FILE_NAME
#define   FILE_NAME   "data.map"
#endif

#ifndef   FILE_SIZE
#define   FILE_SIZE   3221225472UL
#endif

int main(void)
{
    const size_t      size = FILE_SIZE;
    const char *const file = FILE_NAME;

    size_t            page;
    unsigned char    *data;

    int               descriptor;
    int               result;

    /* First, obtain the normal page size. */
    page = (size_t)sysconf(_SC_PAGESIZE);
    if (page < 1) {
        fprintf(stderr, "BUG: sysconf(_SC_PAGESIZE) returned an invalid value!\n");
        return EXIT_FAILURE;
    }

    /* Verify the map size is a multiple of page size. */
    if (size % page) {
        fprintf(stderr, "Map size (%lu) is not a multiple of page size (%lu)!\n",
                (unsigned long)size, (unsigned long)page);
        return EXIT_FAILURE;
    }

    /* Create backing file. */
    do {
        descriptor = open(file, O_RDWR | O_CREAT | O_EXCL, 0600);
    } while (descriptor == -1 && errno == EINTR);
    if (descriptor == -1) {
        fprintf(stderr, "Cannot create backing file '%s': %s.\n", file, strerror(errno));
        return EXIT_FAILURE;
    }

#ifdef FILE_ALLOCATE

    /* Allocate disk space for backing file. */
    do {
        result = posix_fallocate(descriptor, (off_t)0, (off_t)size);
    } while (result == -1 && errno == EINTR);
    if (result == -1) {
        fprintf(stderr, "Cannot resize and allocate %lu bytes for backing file '%s': %s.\n",
                (unsigned long)size, file, strerror(errno));
        unlink(file);
        return EXIT_FAILURE;
    }

#else

    /* Backing file is sparse; disk space is not allocated. */
    do {
        result = ftruncate(descriptor, (off_t)size);
    } while (result == -1 && errno == EINTR);
    if (result == -1) {
        fprintf(stderr, "Cannot resize backing file '%s' to %lu bytes: %s.\n",
                file, (unsigned long)size, strerror(errno));
        unlink(file);
        return EXIT_FAILURE;
    }

#endif

    /* Map the file.
     * If MAP_NORESERVE is not used, then the mapping size is limited
     * to the amount of available RAM and swap combined in Linux.
     * MAP_NORESERVE means that no swap is allocated for the mapping;
     * the file itself acts as the backing store. That's why MAP_SHARED
     * is also used. */
    do {
        data = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
                    descriptor, (off_t)0);
    } while ((void *)data == MAP_FAILED && errno == EINTR);
    if ((void *)data == MAP_FAILED) {
        fprintf(stderr, "Cannot map file '%s': %s.\n", file, strerror(errno));
        unlink(file);
        return EXIT_FAILURE;
    }

    /* Notify of success. */
    fprintf(stdout, "Mapped %lu bytes of file '%s'.\n", (unsigned long)size, file);
    fflush(stdout);

#if defined(FILE_FILL)
    memset(data, ~0UL, size);
#elif defined(FILE_ZERO)
    memset(data, 0, size);
#elif defined(FILE_MIDDLE)
    data[size/2] = 1; /* One byte in the middle set to one. */
#else

    /*
     * Do something with the mapping, data[0] .. data[size-1]
    */

#endif

    /* Unmap. */
    do {
        result = munmap(data, size);
    } while (result == -1 && errno == EINTR);
    if (result == -1)
        fprintf(stderr, "munmap(): %s.\n", strerror(errno));

    /* Close the backing file. */
    result = close(descriptor);
    if (result)
        fprintf(stderr, "close(): %s.\n", strerror(errno));

#ifndef FILE_KEEP

    /* Remove the backing file. */
    result = unlink(file);
    if (result)
        fprintf(stderr, "unlink(): %s.\n", strerror(errno));

#endif

    /* We keep the file. */
    fprintf(stdout, "Done.\n");
    fflush(stdout);

    return EXIT_SUCCESS;
}

To compile and run, use e.g.

gcc -W -Wall -O3 -DFILE_KEEP -DFILE_MIDDLE example.c -o example
./example

The above will create a three-gigabyte (10243) sparse file data.map, and set the middle byte in it to 1 (\x01). All other bytes in the file remain zeroes. You can then run

du -h data.map

to see how much such a sparse file actually takes on-disk, and

hexdump -C data.map

if you wish to verify the file contents are what I claim they are.

There are a few compile-time flags (macros) you can use to change how the example program behaves:

  • '-DFILE_NAME="filename"'

    Use file name filename instead of data.map. Note that the entire value is defined inside single quotes, so that the shell does not parse the double quotes. (The double quotes are part of the macro value.)

  • '-DFILE_SIZE=(1024*1024*1024)'

    Use 10243 = 1073741824 byte mapping instead of the default 3221225472. If the expression contains special characters the shell would try to evaluate, it is best to enclose it all in single or double quotes.

  • -DFILE_ALLOCATE

    Allocate actual disk space for the entire mapping. By default, a sparse file is used instead.

  • -DFILE_FILL

    Fill the entire mapping with (unsigned char)(~0UL), typically 255.

  • -DFILE_ZERO

    Clear the entire mapping to zero.

  • -DFILE_MIDDLE

    Set the middle byte in the mapping to 1. All other bytes are unchanged.

  • -DFILE_KEEP

    Do not delete the data file. This is useful to explore how much data the mapping actually requires on disk; use e.g. du -h data.map.


There are three key limitations to consider when using memory-mapped files in Linux:

  1. File size limits

    Older file systems like FAT (MS-DOS) do not support large files, or sparse files. Sparse files are useful if the dataset is sparse (contains large holes); in that case the unset parts are not stored on disk, and simply read as zeroes.

    Because many filesystems have problems with reads and writes larger than 231-1 bytes (2147483647 bytes), current Linux kernels internally limit each single operation to 231-1 bytes. The read or write call does not fail, it just returns a short count. I am not aware of any filesystem similarly limiting the llseek() syscall, but since the C library is responsible for mapping the lseek()/lseek64() functions to the proper syscalls, it is quite possible the C library (and not the kernel) limits the functionality. (In the case of the GNU C library and Embedded GNU C library, such syscall mapping is dependent on the compile-time flags. For example, see man 7 feature_test_macros, man 2 lseek and man 3 lseek64.

    Finally, file position handling is not atomic in most Linux kernels. (Patches are upstream, but I'm not sure which releases contain them.) This means that if more than one thread uses the same descriptor in a way that modifies the file position, it is possible the file position gets completely garbled.

  2. Memory limits

    By default, file-backed memory maps are still subject to available memory and swap limits. That is, default mmap() behaviour is to assume that at memory pressure, dirty pages are swapped, not flushed to disk. You'll need to use the Linux-specific MAP_NORESERVE flag to avoid those limits.

  3. Address space limits

    On 32-bit Linux systems, the address space available to an userspace process is typically less than 4 GiB; it is a kernel compile-time option.

    On 64-bit Linux systems, large mappings consume significant amounts of RAM, even if the mapping contents themselves are not yet faulted in. Typically, each single page requires 8 bytes of metadata ("page table entry") in memory, or more, depending on architecture. Using 4096-byte pages, this means a minimum overhead of 0.1953125%, and setting up e.g. a terabyte map requires two gigabytes of RAM just in page table structures!

    Many 64-bit systems in Linux support huge pages to avoid that overhead. In most cases, huge pages are of limited use due to the configuration and tweaking and limitations. Kernels also may have limitations on what a process can do with a huge page mapping; a robust application would need thorough fallbacks to normal page mappings.

The kernel may impose stricter limits than resource availability to user-space processes. Run bash -c 'ulimit -a' to see the currently-imposed limits. (Details are available in the ulimit section in man bash-builtins.)



来源:https://stackoverflow.com/questions/23037130/cant-create-a-file-larger-than-2gb-on-64-bit-linux-system-with-mmap-malloc-open

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!