Ld magically overrides statically linked symbols

假如想象 提交于 2019-12-21 05:07:22

问题


For a few days we are dealing with very strange problem.

I can't understand how it even happens - when a third-party (MATLAB) program uses our shared library, it somehow overrides some of our symbols (boost, to be precise) with it's own. Those symbols are statically linked and (!!) local.

Here is the deal - we use boost 1.47, MATLAB has boost 1.40. Currently, library call segfaults on a call from OUR library to their boost (regex).

So, here is the magic:

  • We have no library dependencies, ldd:
    linux-vdso.so.1 =>  (0x00007fff4abff000)
    libpthread.so.0 => /lib/libpthread.so.0 (0x00007f1a3fd65000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f1a3fa51000)
    libm.so.6 => /lib/libm.so.6 (0x00007f1a3f7cd000)
    libgomp.so.1 => /usr/lib/libgomp.so.1 (0x00007f1a3f5bf000)
    libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f1a3f3a8000)
    libc.so.6 => /lib/libc.so.6 (0x00007f1a3f024000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f1a414f9000)
    librt.so.1 => /lib/librt.so.1 (0x00007f1a3ee1c000)
  • No Cxx symbols (our public symbols are POC C for binary compatibility) are exported from our library, nm:
nm -g --defined-only libmysharedlib.so

addr1 T OurCSymbol1
addr2 T OurCSymbol2
addr3 T OurCSymbol3
...
  • Still, it uses their boost. HOW? Stacktrace (paths cut):
[  0] 0x00007f21fddbb0a9 bin/libmwfl.so+00454825 fl::sysdep::linux::unwind_stack(void const**, unsigned long, unsigned long, fl::diag::thread_context const&)+000009
[  1] 0x00007f21fdd74111 bin/glnxa64/libmwfl.so+00164113 fl::diag::stacktrace_base::capture(fl::diag::thread_context const&, unsigned long)+000161
[  2] 0x00007f21fdd7d42d bin/glnxa64/libmwfl.so+00201773
[  3] 0x00007f21fdd7d6b4 bin/glnxa64/libmwfl.so+00202420 fl::diag::terminate_log(char const*, fl::diag::thread_context const&, bool)+000100
[  4] 0x00007f21fce525a7 bin/glnxa64/libmwmcr.so+00365991
[  5] 0x00007f21fb9eb8f0 lib/libpthread.so.0+00063728
[  6] 0x00007f21f3e939a9 libboost_regex.so.1.40.0+00342441 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_all_states()+000073
[  7] 0x00007f21f3eb6546 bin/glnxa64/libboost_regex.so.1.40.0+00484678 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_imp()+000758
[  8] 0x00007f21c04ad595 lib/libmysharedlib.so+04855189 bool boost::regex_match, std::allocator > >, char, boost::regex_traits > >(__gnu_cxx::__normal_iterator, __gnu_cxx::__normal_iterator, boost::match_results, std::allocator > > >&, boost::basic_regex > > const&, boost::regex_constants::_match_flags)+000245
[  9] 0x00007f21c04a71c7 lib/libmysharedlib.so+04829639 myfunc2()+000183
[ 10] 0x00007f21c01b41e3 lib/libmysharedlib.so+01737187 myfunc1()+000307

It's known, that MATLAB does dlopen with RTLD_NOW flag only.

People, think with me please. Now i'm desperate not to even fix this, but to simply understand ld&elf behavior.

edit: Small additional question: how i understood, without special linker options, symbols in linux .so libraries are never linked by address? So even statically linked local symbols are resolved in runtime?


回答1:


Check out the -Bsymbolic option for ld.

If -Bsymbolic is specified, then at the time of creating a shared object ld will attempt to bind references to global symbols to definitions within the shared library. The default is to defer binding to runtime.

This may be clearer with an example.

Say example.o contains a reference to a global function defined in global.o,

$ nm example.o | grep ' U'
     U _GLOBAL_OFFSET_TABLE_
     U globalfn
$ nm global.o | grep ' T'
00000000 T globalfn

and two shared objects, normal.so and symbolic.so, are built as follows:

$ cc -fPIC -c example.c
$ cc -c global.c
$ rm -f archive.a; ar cr archive.a global.o
$ ld -shared -o normal.so example.o archive.a
$ ld -Bsymbolic -shared -o symbolic.so example.o archive.a

Disassembling the code for normal.so shows that the call to globalfn is actually going through the procedure linkage table, and thus the final destination of the call is determined at runtime.

$ objdump --disassemble normal.so
...snip...
00000194 <example>:
...snip...
 1a6:   e8 d9 ff ff ff          call   184 <globalfn@plt>
...snip...
$ readelf -r normal.so

Relocation section '.rel.plt' at offset 0x16c contains 1 entries:
Offset     Info    Type            Sym.Value  Sym. Name
00001244  00000207 R_386_JUMP_SLOT   000001b8   globalfn

Whereas in symbolic.so, the call always invokes the definition of globalfn within the shared object.

$ objdump --disassemble symbolic.so
...snip...
0000016c <shared>:
...snip...
 17e:   e8 0d 00 00 00          call   190 <globalfn>
...snip...
$ readelf -r symbolic.so

There are no relocations in this file.



回答2:


Here is the deal - we use boost 1.47, MATLAB has boost 1.40. Currently, library call segfaults on a call from OUR library to their boost (regex).

You are invoking undefined behavior, which is a "Doctor, it hurts when I do this" kind of situation. The Matlab executable already contains external functions for class boost::re_detail::perl_matcher< elided >. When Matlab loads your shared library the dynamic linker sees that your shared library defines those exact same symbols in a way that conflicts with the existing definitions. Undefined behavior.

The solution is to build a version of your library for use with Matlab that uses the same version of Boost as does Matlab.



来源:https://stackoverflow.com/questions/7201667/ld-magically-overrides-statically-linked-symbols

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!