legacy gcc compiler issues

问题

We are using a legacy compiler, based on gcc 2.6.0, to cross compile for an old imbedded processor we are still using (yes, it is still in use since 1994!). The engineer that did the gcc port for this chip has long since moved on. Although we might be able to recover the gcc 2.6.0 source from somewhere on the web, the change set for this chip has disappeared in the halls of corporate history. We have muddled along until recently as the compiler still ran and produced workable executables, but as of linux kernel 2.6.25 (and also 2.6.26) it fails with the message gcc: virtual memory exhausted... even when run with no parameters or with only -v. I have rebooted my development system (from 2.6.26) using the 2.6.24 kernel and the compiler works again (rebooting with 2.6.25 does not).

We have one system that we are keeping at 2.6.24 just for the purpose of doing builds for this chip, but are feeling a bit exposed in case the linux world moves on to the point that we cannot any longer rebuild a system that will run the compiler (i.e. our 2.6.24 system dies and we cannot get 2.6.24 to install and run on a new system because some of the software parts are no longer available).

Does anyone have any ideas for what we might be able to do to a more modern installation to get this legacy compiler to run?

Edit:

To answer some of the comments...

Sadly it is the source code changes that are specific to our chip that are lost. This loss occurred over two major company reorgs and several sysadmins (a couple of which really left a mess). We now use configuration control, but that is closing the barn door too late for this problem.

The use of a VM is a good idea, and may be what we end up doing. Thank you for that idea.

Finally, I tried strace as ephemient suggested and found that the last system call was brk() which returned an error on the new system (2.6.26 kernel) and returned success on the old system (2.6.24 kernel). This would indicate that I really am running out of virtual memory, except that tcsh "limit" returns the same values on old and new systems, and /proc/meminfo shows the new systems has slightly more memory and quite a bit more swap space. Maybe it is a problem of fragmentation or where the program is being loaded?

I did some further research and "brk randomization" was added in kernel 2.6.25, however CONFIG_COMPAT_BRK is supposedly enabled by default (which disables brk randomization).

Edit:

OK, more info: It really looks like brk randomization is the culprit, the legacy gcc is calling brk() to change the end of the data segment and that now fails, causing the legacy gcc to report "virtual memory exhausted". There are a few documented ways to disable brk randomization:

sudo echo 0 > /proc/sys/kernel/randomize_va_space
sudo sysctl -w kernel.randomize_va_space=0
starting a new shell with setarch i386 -R tcsh (or "-R -L")

I have tried them and they do seem to have an effect in that the brk() return value is different (and always the same) than without them (tried on both kernel 2.6.25 and 2.6.26), but the brk() still fails so the legacy gcc still fails :-(.

In addition I have set vm.legacy_va_layout=1 and vm.overcommit_memory=2 with no change, and I have rebooted with the vm.legacy_va_layout=1 and kernel.randomize_va_space=0 settings saved in /etc/sysctl.conf. Still no change.

Edit:

Using kernel.randomize_va_space=0 on kernel 2.6.26 (and 2.6.25) results in the following brk() call being reported by strace legacy-gcc:

brk(0x80556d4) = 0x8056000

This indicates the brk() failed, but it looks like it failed because the the data segment already ends beyond what was requested. Using objdump, I can see the data segment should end at 0x805518c whereas the failed brk() indicates that the data segment currently ends at 0x8056000:

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .interp       00000013  080480d4  080480d4  000000d4  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .hash         000001a0  080480e8  080480e8  000000e8  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .dynsym       00000410  08048288  08048288  00000288  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .dynstr       0000020e  08048698  08048698  00000698  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .rel.bss      00000038  080488a8  080488a8  000008a8  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .rel.plt      00000158  080488e0  080488e0  000008e0  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .init         00000008  08048a40  08048a40  00000a40  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  7 .plt          000002c0  08048a48  08048a48  00000a48  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  8 .text         000086cc  08048d10  08048d10  00000d10  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  9 .fini         00000008  080513e0  080513e0  000093e0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 10 .rodata       000027d0  080513e8  080513e8  000093e8  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 11 .data         000005d4  08054bb8  08054bb8  0000bbb8  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 12 .ctors        00000008  0805518c  0805518c  0000c18c  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 13 .dtors        00000008  08055194  08055194  0000c194  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 14 .got          000000b8  0805519c  0805519c  0000c19c  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 15 .dynamic      00000088  08055254  08055254  0000c254  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 16 .bss          000003b8  080552dc  080552dc  0000c2dc  2**3
                  ALLOC
 17 .note         00000064  00000000  00000000  0000c2dc  2**0
                  CONTENTS, READONLY
 18 .comment      00000062  00000000  00000000  0000c340  2**0
                  CONTENTS, READONLY
SYMBOL TABLE:
no symbols

Edit:

To echo ephemient's comment below: "So strange to treat GCC as a binary without source"!

So, using strace, objdump, gdb and my limited understanding of 386 assembler and architecture I have traced the problem to the 1st malloc call in the legacy code. The legacy gcc calls malloc, which returns NULL, which results in the "virtual memory exhausted" message on stderr. This malloc is in libc.so.5, and it calls getenv a bunch of times and ends up calling brk()... I guess to increase the heap... which fails.

From this I can only surmise that the problem is more than brk randomization, or I have not fully disabled brk randomization, despite the randomize_va_space=0 and legacy_va_layout=1 sysctl settings.

回答1:

Install linux + the old gcc onto a virtual machine.

回答2:

Do you have the sources for this custom compiler? If you can recover the 2.6.0 baseline (and that should be relatively easy), then diff and patch should recover your change set.

What's I'd then recommend is using that change set to build a new version against up to date gcc. AND THEN PUT IT UNDER CONFIGURATION CONTROL.

Sorry, don't mean to shout. It's just I've been saying the same thing for most of 30 years.

回答3:

Can you strace the gcc-2.6.0 executable? It may be doing something like reading /proc/$$/maps, and getting confused when the output changes in insignificant ways. A similar problem was recently noticed between 2.6.28 and 2.6.29.

If so, you can hack /usr/src/linux/fs/proc/task_mmu.c or thereabouts to restore the old output, or set up some $LD_PRELOAD to fake gcc into reading another file.

Edit

Since you mentioned brk...

CONFIG_COMPAT_BRK makes the default kernel.randomize_va_space=1 instead of 2, but that still randomizes everything other than the heap (brk).

See if your problem goes away if you echo 0 > /proc/sys/kernel/randomize_va_space or sysctl kernel.randomize_va_space=0 (equivalent).

If so, add kernel.randomize_va_space = 0 to /etc/sysctl.conf or add norandmaps to the kernel command line (equivalent), and be happy again.

回答4:

I came across this and thought about your problem. May be you can find a way to play with the binary to move it to ELF format ? Or may be it is irrelevant, but playing with objdump can provide you more information.

Can you have a look at the process memory map ?

回答5:

So I have worked something out... it is not a complete solution, but it does get past the original problem I had with the legacy gcc.

Putting breakpoints on every libc call in the .plt (procedure linkage table) I see that malloc (in libc.so.5) calls getenv() to get:

    MALLOC_TRIM_THRESHOLD_
    MALLOC_TOP_PAD_
    MALLOC_MMAP_THRESHOLD_
    MALLOC_MMAP_MAX_
    MALLOC_CHECK_

So I web-searched these and found this which advised

    setenv MALLOC_TOP_PAD_ 536870912

then the legacy gcc WORKS!!!!

But not home free, it got up to the link in the build before failing, so there is something further going on with the legacy nld we have :-( It is reporting:

    Virtual memory exceeded in `new'

In /etc/sysctl.conf I have:

    kernel.randomize_va_space=0
    vm.legacy_va_layout=1

It still works the same if

    kernel.randomize_va_space=1
    vm.legacy_va_layout=0

but not if

kernel.randomize_va_space=2

There was a suggestion to use "ldd" to see the shared library dependencies: the legacy gcc only needs libc5, but the legacy nld also needs libg++.so.27, libstdc++.so.27, libm.so.5 and apparently there is a libc5 version of libg++.so.27 (libg++27-altdev ??) and what about libc5-compat?

So, as I said, not yet home free... be getting closer. I'll probably post a new question about the nld problem.

Edit:

I was originally going to refrain from "Accepting" this answer since it I still have a problem with the corresponding legacy linker, but in order to get some finality on this question at least, I am rethinking that position.

Thank-you's go out to:

an0nym0usc0ward for the suggestion of using a vm (which may ultimately become the Accepted Answer)
ephemient for suggesting using strace, and help with stackoverflow usage
shodanex for suggesting using objdump

Edit

Below is the last stuff that I learned, and now I will accept the VM solution since I could not fully solve it any other way (at least in the time alloted for this).

The newer kernels have a CONFIG_COMPAT_BRK build flag to allow libc5 to be used, so presumably building a new kernel with this flag will fix the problem (and looking through the kernel src, it looks like it will, but I cant be sure since I did not follow all of the paths). There is also another documented way to allow libc5 use at runtime (rather than at kernel build time): sudo sysctl -w kernel.randomize_va_space=0. This, however does not do a complete job and some (most?) libc5 apps will still break, e.g. our legacy compiler and linker. This seems to be due to a difference in alignment assumptions between the newer and older kernels. I have patched the linker binary to make it think it has a bigger bss section, in order to bring the end of the bss up to a page boundary, and this works on the newer kernel when the sysctl var kernel.randomize_va_space=0. This is NOT a satisfactory solution to me since I am blindly patching a critical binary executable, and even though running the patched linker on the newer kernel produced a bit-identical output to the original linker run on the older kernel, that does not prove that some other linker input (i.e. we change the program being linked) will also produce identical results.

回答6:

Could you not simply make a disc image that can be re-installed if the system dies? or make a VM?

来源：https://stackoverflow.com/questions/779964/legacy-gcc-compiler-issues

标签

Linux

gcc

legacy

cross-compiling

virtual-memory