Sometimes it\'s handy to mock up something with a little C program that uses a big chunk of static memory. I noticed after changing to Fedora 15 the program took a long
I am able to reproduce this on an Ubuntu 10.10 system (GNU ld (GNU Binutils for Ubuntu) 2.20.51-system.20100908
), and I think I have your answer. First, some methodology.
After confirming this happens to me in a small VM (512MB ram, 2GB swap), from here I decided the easiest thing to do would be to strace gcc and see what exactly was going on when everything went to hell:
~# strace -f gcc swap.c
It illuminated the following:
vfork() = 3589
[pid 3589] execve("/usr/lib/gcc/x86_64-linux-gnu/4.4.5/collect2", ["/usr/lib/gcc/x86_64-linux-gnu/4."..., "--build-id", "--eh-frame-hdr", "-m", "elf_x86_64", "--hash-style=gnu", "-dynamic-linker", "/lib64/ld-linux-x86-64.so.2", "-o", "swap", "-z", "relro", "/usr/lib/gcc/x86_64-linux-gnu/4."..., "/usr/lib/gcc/x86_64-linux-gnu/4."..., "/usr/lib/gcc/x86_64-linux-gnu/4."..., "-L/usr/lib/gcc/x86_64-linux-gnu/"..., ...], [/* 26 vars */]) = 0
...
[pid 3589] vfork() = 3590
...
[pid 3590] execve("/usr/bin/ld", ["/usr/bin/ld", "--build-id", "--eh-frame-hdr", "-m", "elf_x86_64", "--hash-style=gnu", "-dynamic-linker", "/lib64/ld-linux-x86-64.so.2", "-o", "swap", "-z", "relro", "/usr/lib/gcc/x86_64-linux-gnu/4."..., "/usr/lib/gcc/x86_64-linux-gnu/4."..., "/usr/lib/gcc/x86_64-linux-gnu/4."..., "-L/usr/lib/gcc/x86_64-linux-gnu/"..., ...], [/* 27 vars */]) = 0
...
[pid 3590] lseek(13, 4096, SEEK_SET) = 4096
[pid 3590] read(13, ".\4@\0\0\0\0\0>\4@\0\0\0\0\0N\4@\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid 3590] mmap(NULL, 1600004096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1771931000
It would appear that, as we might have suspected, it looks like ld
is actually trying to anonymously mmap
the entire static memory space of this array (or possibly the entire program, it's hard to tell since the rest of the program is so small, it might all fit in that extra 4096).
So that's all well and good, but why does it work when we exceed the available swap on the system? Let's turn swapoff
and run strace -f
again...
[pid 3618] lseek(13, 4096, SEEK_SET) = 4096
[pid 3618] read(13, ".\4@\0\0\0\0\0>\4@\0\0\0\0\0N\4@\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid 3618] mmap(NULL, 1600004096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid 3618] brk(0x60638000) = 0x1046000
[pid 3618] mmap(NULL, 1600135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid 3618] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7fd011864000
...
Unsurprisingly, ld seems to do the same thing it tried last time, to mmap the entire space. but the system is no longer able to do that, it fails! ld tries again, and it fails again, then ld does something unexpected... it moves on with less memory.
Weird, I guess we'd better have a look at the ld code then. Drat, it doesn't do an explicit mmap
. This must be coming from inside of a plain old malloc
. We'll have to build ld with some debug symbols to track this down. Unfortunately, when I built bin-utils 2.21.1 the problem went away. Perhap it's been fixed in newer versions of bin-utils?