问题
We are using a legacy compiler, based on gcc 2.6.0, to cross compile for an old imbedded processor we are still using (yes, it is still in use since 1994!). The engineer that did the gcc port for this chip has long since moved on. Although we might be able to recover the gcc 2.6.0 source from somewhere on the web, the change set for this chip has
disappeared in the halls of corporate history. We have muddled along until recently as the compiler still ran and produced workable executables, but as of linux kernel 2.6.25 (and also 2.6.26) it fails with the message gcc: virtual memory exhausted
... even when run with no parameters or with only -v
. I have rebooted my development system (from 2.6.26) using the 2.6.24 kernel and the compiler works again (rebooting with 2.6.25 does not).
We have one system that we are keeping at 2.6.24 just for the purpose of doing builds for this chip, but are feeling a bit exposed in case the linux world moves on to the point that we cannot any longer rebuild a system that will run the compiler (i.e. our 2.6.24 system dies and we cannot get 2.6.24 to install and run on a new system because some of the software parts are no longer available).
Does anyone have any ideas for what we might be able to do to a more modern installation to get this legacy compiler to run?
Edit:
To answer some of the comments...
Sadly it is the source code changes that are specific to our chip that are lost. This loss occurred over two major company reorgs and several sysadmins (a couple of which really left a mess). We now use configuration control, but that is closing the barn door too late for this problem.
The use of a VM is a good idea, and may be what we end up doing. Thank you for that idea.
Finally, I tried strace as ephemient suggested and found that the last system call was brk() which returned an error on the new system (2.6.26 kernel) and returned success on the old system (2.6.24 kernel). This would indicate that I really am running out of virtual memory, except that tcsh "limit" returns the same values on old and new systems, and /proc/meminfo shows the new systems has slightly more memory and quite a bit more swap space. Maybe it is a problem of fragmentation or where the program is being loaded?
I did some further research and "brk randomization" was added in kernel 2.6.25, however CONFIG_COMPAT_BRK
is supposedly enabled by default (which disables brk randomization).
Edit:
OK, more info: It really looks like brk randomization is the culprit, the legacy gcc is calling brk() to change the end of the data segment and that now fails, causing the legacy gcc to report "virtual memory exhausted". There are a few documented ways to disable brk randomization:
sudo echo 0 > /proc/sys/kernel/randomize_va_space
sudo sysctl -w kernel.randomize_va_space=0
starting a new shell with
setarch i386 -R tcsh
(or "-R -L")
I have tried them and they do seem to have an effect in that the brk() return value is different (and always the same) than without them (tried on both kernel 2.6.25 and 2.6.26), but the brk() still fails so the legacy gcc still fails :-(.
In addition I have set vm.legacy_va_layout=1
and vm.overcommit_memory=2
with no change, and I have rebooted with the vm.legacy_va_layout=1
and kernel.randomize_va_space=0
settings saved in /etc/sysctl.conf. Still no change.
Edit:
Using kernel.randomize_va_space=0
on kernel 2.6.26 (and 2.6.25) results in the following brk() call being reported by strace legacy-gcc
:
brk(0x80556d4) = 0x8056000
This indicates the brk() failed, but it looks like it failed because the the data segment already ends beyond what was requested. Using objdump, I can see the data segment should end at 0x805518c whereas the failed brk() indicates that the data segment currently ends at 0x8056000:
Sections: Idx Name Size VMA LMA File off Algn 0 .interp 00000013 080480d4 080480d4 000000d4 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 1 .hash 000001a0 080480e8 080480e8 000000e8 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .dynsym 00000410 08048288 08048288 00000288 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .dynstr 0000020e 08048698 08048698 00000698 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .rel.bss 00000038 080488a8 080488a8 000008a8 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 5 .rel.plt 00000158 080488e0 080488e0 000008e0 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 6 .init 00000008 08048a40 08048a40 00000a40 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 7 .plt 000002c0 08048a48 08048a48 00000a48 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 8 .text 000086cc 08048d10 08048d10 00000d10 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 9 .fini 00000008 080513e0 080513e0 000093e0 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 10 .rodata 000027d0 080513e8 080513e8 000093e8 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 11 .data 000005d4 08054bb8 08054bb8 0000bbb8 2**2 CONTENTS, ALLOC, LOAD, DATA 12 .ctors 00000008 0805518c 0805518c 0000c18c 2**2 CONTENTS, ALLOC, LOAD, DATA 13 .dtors 00000008 08055194 08055194 0000c194 2**2 CONTENTS, ALLOC, LOAD, DATA 14 .got 000000b8 0805519c 0805519c 0000c19c 2**2 CONTENTS, ALLOC, LOAD, DATA 15 .dynamic 00000088 08055254 08055254 0000c254 2**2 CONTENTS, ALLOC, LOAD, DATA 16 .bss 000003b8 080552dc 080552dc 0000c2dc 2**3 ALLOC 17 .note 00000064 00000000 00000000 0000c2dc 2**0 CONTENTS, READONLY 18 .comment 00000062 00000000 00000000 0000c340 2**0 CONTENTS, READONLY SYMBOL TABLE: no symbols
Edit:
To echo ephemient's comment below: "So strange to treat GCC as a binary without source"!
So, using strace, objdump, gdb and my limited understanding of 386 assembler and architecture I have traced the problem to the 1st malloc call in the legacy code. The legacy gcc calls malloc, which returns NULL, which results in the "virtual memory exhausted" message on stderr. This malloc is in libc.so.5, and it calls getenv a bunch of times and ends up calling brk()... I guess to increase the heap... which fails.
From this I can only surmise that the problem is more than brk randomization, or I have not fully disabled brk randomization, despite the randomize_va_space=0 and legacy_va_layout=1 sysctl settings.
回答1:
Install linux + the old gcc onto a virtual machine.
回答2:
Do you have the sources for this custom compiler? If you can recover the 2.6.0 baseline (and that should be relatively easy), then diff and patch should recover your change set.
What's I'd then recommend is using that change set to build a new version against up to date gcc. AND THEN PUT IT UNDER CONFIGURATION CONTROL.
Sorry, don't mean to shout. It's just I've been saying the same thing for most of 30 years.
回答3:
Can you strace
the gcc-2.6.0
executable? It may be doing something like reading /proc/$$/maps
, and getting confused when the output changes in insignificant ways. A similar problem was recently noticed between 2.6.28 and 2.6.29.
If so, you can hack /usr/src/linux/fs/proc/task_mmu.c
or thereabouts to restore the old output, or set up some $LD_PRELOAD
to fake gcc
into reading another file.
Edit
Since you mentioned brk
...
CONFIG_COMPAT_BRK
makes the default kernel.randomize_va_space=1
instead of 2
, but that still randomizes everything other than the heap (brk
).
See if your problem goes away if you echo 0 > /proc/sys/kernel/randomize_va_space
or sysctl kernel.randomize_va_space=0
(equivalent).
If so, add kernel.randomize_va_space = 0
to /etc/sysctl.conf
or add norandmaps
to the kernel command line (equivalent), and be happy again.
回答4:
I came across this and thought about your problem. May be you can find a way to play with the binary to move it to ELF format ? Or may be it is irrelevant, but playing with objdump can provide you more information.
Can you have a look at the process memory map ?
回答5:
So I have worked something out... it is not a complete solution, but it does get past the original problem I had with the legacy gcc.
Putting breakpoints on every libc call in the .plt (procedure linkage table) I see that malloc (in libc.so.5) calls getenv() to get:
MALLOC_TRIM_THRESHOLD_ MALLOC_TOP_PAD_ MALLOC_MMAP_THRESHOLD_ MALLOC_MMAP_MAX_ MALLOC_CHECK_
So I web-searched these and found this which advised
setenv MALLOC_TOP_PAD_ 536870912
then the legacy gcc WORKS!!!!
But not home free, it got up to the link in the build before failing, so there is something further going on with the legacy nld we have :-( It is reporting:
Virtual memory exceeded in `new'
In /etc/sysctl.conf I have:
kernel.randomize_va_space=0 vm.legacy_va_layout=1
It still works the same if
kernel.randomize_va_space=1 vm.legacy_va_layout=0
but not if
kernel.randomize_va_space=2
There was a suggestion to use "ldd" to see the shared library dependencies: the legacy gcc only needs libc5, but the legacy nld also needs libg++.so.27, libstdc++.so.27, libm.so.5 and apparently there is a libc5 version of libg++.so.27 (libg++27-altdev ??) and what about libc5-compat?
So, as I said, not yet home free... be getting closer. I'll probably post a new question about the nld problem.
Edit:
I was originally going to refrain from "Accepting" this answer since it I still have a problem with the corresponding legacy linker, but in order to get some finality on this question at least, I am rethinking that position.
Thank-you's go out to:
- an0nym0usc0ward for the suggestion of using a vm (which may ultimately become the Accepted Answer)
- ephemient for suggesting using strace, and help with stackoverflow usage
- shodanex for suggesting using objdump
Edit
Below is the last stuff that I learned, and now I will accept the VM solution since I could not fully solve it any other way (at least in the time alloted for this).
The newer kernels have a CONFIG_COMPAT_BRK build flag to allow libc5 to be used, so presumably building a new kernel with this flag will fix the problem (and looking through the kernel src, it looks like it will, but I cant be sure since I did not follow all of the paths). There is also another documented way to allow libc5 use at runtime (rather than at kernel build time): sudo sysctl -w kernel.randomize_va_space=0. This, however does not do a complete job and some (most?) libc5 apps will still break, e.g. our legacy compiler and linker. This seems to be due to a difference in alignment assumptions between the newer and older kernels. I have patched the linker binary to make it think it has a bigger bss section, in order to bring the end of the bss up to a page boundary, and this works on the newer kernel when the sysctl var kernel.randomize_va_space=0. This is NOT a satisfactory solution to me since I am blindly patching a critical binary executable, and even though running the patched linker on the newer kernel produced a bit-identical output to the original linker run on the older kernel, that does not prove that some other linker input (i.e. we change the program being linked) will also produce identical results.
回答6:
Could you not simply make a disc image that can be re-installed if the system dies? or make a VM?
来源:https://stackoverflow.com/questions/779964/legacy-gcc-compiler-issues