Disable AVX-optimized functions in glibc (LD_HWCAP_MASK, /etc/ld.so.nohwcap) for valgrind & gdb record

后端 未结 5 1885
闹比i
闹比i 2020-11-30 07:13

Modern x86_64 linux with glibc will detect that CPU has support of AVX extension and will switch many string functions from generic implementation to AVX-optimized version (

相关标签:
5条回答
  • 2020-11-30 07:22

    Not the best or complete solution, just a smallest bit-editing kludge to allow valgrind and gdb record for the my task.

    Lekensteyn asks:

    how to mask out AVX/SSE without recompiling glibc

    I did full rebuild of unmodified glibc, which is rather easy in debian and ubuntu: just sudo apt-get source glibc, sudo apt-get build-dep glibc and cd glibc-*/; dpkg-buildpackage -us -uc (manual to get the ld.so without stripped debugging information.

    Then I did binary (bit) patching of the output ld.so file, in the function used by __get_cpu_features. Target function was compiled from get_common_indeces of source file sysdeps/x86/cpu-features.c under the name of get_common_indeces.constprop.1 (it is just next after the __get_cpu_features in the binary code). It has several cpuids, first one is cpuid eax=1 "Processor Info and Feature Bits"; and later there is check "jle 0x6" and jump down around the code "cpuid eax=7 ecx=0 Extended Features" just to get AVX2 status. There is the code which was compiled into this logic:

    get_common_indeces (struct cpu_features *cpu_features,
                unsigned int *family, unsigned int *model,
                unsigned int *extended_model, unsigned int *stepping)
    { ...
      if (cpu_features->max_cpuid >= 7)
        __cpuid_count (7, 0,
               cpu_features->cpuid[COMMON_CPUID_INDEX_7].eax,
               cpu_features->cpuid[COMMON_CPUID_INDEX_7].ebx,
               cpu_features->cpuid[COMMON_CPUID_INDEX_7].ecx,
               cpu_features->cpuid[COMMON_CPUID_INDEX_7].edx);
    

    The cpu_features->max_cpuid was filled in init_cpu_features of the same file in __cpuid (0, cpu_features->max_cpuid, ebx, ecx, edx); line. It was easier to disable the if statement by replacing jle after cmp 0x6 with jg (byte 0x7e to 0x7f). (Actually this binary patch was reapplied manually to the __get_cpu_features function of real system ld-linux.so.2 - first jle before mov 7 eax; xor ecx,ecx; cpuid changed into jg.)

    Recompiled package and modified ld.so were not installed into the system; I used commandline syntax of ld.so ./my_program (or mv ld.so /some/short/path.so and patchelf --set-interpreter ./my_program).

    Other possible solutions:

    • try to use more recent valgrind & gdb record tools
    • try to use older glibc
    • implement missing instruction emulation in gdb record if it is not done
    • do source code patching around if (cpu_features->max_cpuid >= 7) in glibc and recompile
    • do source code patching around avx2-enabled string functions in glibc and recompile
    0 讨论(0)
  • 2020-11-30 07:24

    I heard that there are /etc/ld.so.nohwcap and LD_HWCAP_MASK configurations in glibc. Can they be used to disable ifunc dispatching to AVX-optimized string functions in glibc?

    Yes: setting LD_HWCAP_MASK=0 will make GLIBC pretend that none of the CPU capabilities are available. Code.

    Setting the mask to 0 is likely to trigger an error, you'll likely need to figure out the precise bit that controls AVX, and mask just that bit.

    0 讨论(0)
  • 2020-11-30 07:27

    There does not seem a straightforward runtime method to patch feature detection. This detection happens rather early in the dynamic linker (ld.so).

    Binary patching the linker seems the easiest method at the moment. @osgx described one method where a jump is overwritten. Another approach is just to fake the cpuid result. Normally cpuid(eax=0) returns the highest supported function in eax while the manufacturer IDs are returned in registers ebx, ecx and edx. We have this snippet in glibc 2.25 sysdeps/x86/cpu-features.c:

    __cpuid (0, cpu_features->max_cpuid, ebx, ecx, edx);
    
    /* This spells out "GenuineIntel".  */
    if (ebx == 0x756e6547 && ecx == 0x6c65746e && edx == 0x49656e69)
      {
          /* feature detection for various Intel CPUs */
      }
    /* another case for AMD */
    else
      {
        kind = arch_kind_other;
        get_common_indeces (cpu_features, NULL, NULL, NULL, NULL);
      }
    

    The __cpuid line translates to these instructions in /lib/ld-linux-x86-64.so.2 (/lib/ld-2.25.so):

    172a8:       31 c0                   xor    eax,eax
    172aa:       c7 44 24 38 00 00 00    mov    DWORD PTR [rsp+0x38],0x0
    172b1:       00 
    172b2:       c7 44 24 3c 00 00 00    mov    DWORD PTR [rsp+0x3c],0x0
    172b9:       00 
    172ba:       0f a2                   cpuid  
    

    So rather than patching branches, we could as well change the cpuid into a nop instruction which would result in invocation of the last else branch (as the registers will not contain "GenuineIntel"). Since initially eax=0, cpu_features->max_cpuid will also be 0 and the if (cpu_features->max_cpuid >= 7) will also be bypassed.

    Binary patching cpuid(eax=0) by nop this can be done with this utility (works for both x86 and x86-64):

    #!/usr/bin/env python
    import re
    import sys
    
    infile, outfile = sys.argv[1:]
    d = open(infile, 'rb').read()
    # Match CPUID(eax=0), "xor eax,eax" followed closely by "cpuid"
    o = re.sub(b'(\x31\xc0.{0,32}?)\x0f\xa2', b'\\1\x66\x90', d)
    assert d != o
    open(outfile, 'wb').write(o)
    

    An equivalent Perl variant, -0777 ensures that the file is read at once instead of separating records at line feeds:

    perl -0777 -pe 's/\x31\xc0.{0,32}?\K\x0f\xa2/\x66\x90/' < /lib/ld-linux-x86-64.so.2 > ld-linux-x86-64-patched.so.2
    # Verify result, should display "Success"
    cmp -s /lib/ld-linux-x86-64.so.2 ld-linux-x86-64-patched.so.2 && echo 'Not patched' || echo Success
    

    That was the easy part. Now, I did not want to replace the system-wide dynamic linker, but execute only one particular program with this linker. Sure, that can be done with ./ld-linux-x86-64-patched.so.2 ./a, but the naive gdb invocations failed to set breakpoints:

    $ gdb -q -ex "set exec-wrapper ./ld-linux-x86-64-patched.so.2" -ex start ./a
    Reading symbols from ./a...done.
    Temporary breakpoint 1 at 0x400502: file a.c, line 5.
    Starting program: /tmp/a 
    During startup program exited normally.
    (gdb) quit
    $ gdb -q -ex start --args ./ld-linux-x86-64-patched.so.2 ./a
    Reading symbols from ./ld-linux-x86-64-patched.so.2...(no debugging symbols found)...done.
    Function "main" not defined.
    Temporary breakpoint 1 (main) pending.
    Starting program: /tmp/ld-linux-x86-64-patched.so.2 ./a
    [Inferior 1 (process 27418) exited normally]
    (gdb) quit                                                                                                                                                                         
    

    A manual workaround is described in How to debug program with custom elf interpreter? It works, but it is unfortunately a manual action using add-symbol-file. It should be possible to automate it a bit using GDB Catchpoints though.

    An alternative approach that does not binary linking is LD_PRELOADing a library that defines custom routines for memcpy, memove, etc. This will then take precedence over the glibc routines. The full list of functions is available in sysdeps/x86_64/multiarch/ifunc-impl-list.c. Current HEAD has more symbols compared to the glibc 2.25 release, in total (grep -Po 'IFUNC_IMPL \(i, name, \K[^,]+' sysdeps/x86_64/multiarch/ifunc-impl-list.c):

    memchr, memcmp, __memmove_chk, memmove, memrchr, __memset_chk, memset, rawmemchr, strlen, strnlen, stpncpy, stpcpy, strcasecmp, strcasecmp_l, strcat, strchr, strchrnul, strrchr, strcmp, strcpy, strcspn, strncasecmp, strncasecmp_l, strncat, strncpy, strpbrk, strspn, strstr, wcschr, wcsrchr, wcscpy, wcslen, wcsnlen, wmemchr, wmemcmp, wmemset, __memcpy_chk, memcpy, __mempcpy_chk, mempcpy, strncmp, __wmemset_chk,

    0 讨论(0)
  • 2020-11-30 07:32

    It looks like there is a nice workaround for this implemented in recent versions of glibc: a "tunables" feature that guides selection of optimized string functions. You can find a general overview of this feature here and the relevant code inside glibc in ifunc-impl-list.c.

    Here's how I figured it out. First, I took the address being complained about by gdb:

    Process record does not support instruction 0xc5 at address 0x7ffff75c65d4.

    I then looked it up in the table of shared libraries:

    (gdb) info shared
    From                To                  Syms Read   Shared Object Library
    0x00007ffff7fd3090  0x00007ffff7ff3130  Yes         /lib64/ld-linux-x86-64.so.2
    0x00007ffff76366b0  0x00007ffff766b52e  Yes         /usr/lib/x86_64-linux-gnu/libubsan.so.1
    0x00007ffff746a320  0x00007ffff75d9cab  Yes         /lib/x86_64-linux-gnu/libc.so.6
    ...
    

    You can see that this address is within glibc. But what function, specifically?

    (gdb) disassemble 0x7ffff75c65d4
    Dump of assembler code for function __strcmp_avx2:
       0x00007ffff75c65d0 <+0>:     mov    %edi,%eax
       0x00007ffff75c65d2 <+2>:     xor    %edx,%edx
    => 0x00007ffff75c65d4 <+4>:     vpxor  %ymm7,%ymm7,%ymm7
    

    I can look in ifunc-impl-list.c to find the code that controls selecting the avx2 version:

      IFUNC_IMPL (i, name, strcmp,
              IFUNC_IMPL_ADD (array, i, strcmp,
                      HAS_ARCH_FEATURE (AVX2_Usable),
                      __strcmp_avx2)
              IFUNC_IMPL_ADD (array, i, strcmp, HAS_CPU_FEATURE (SSE4_2),
                      __strcmp_sse42)
              IFUNC_IMPL_ADD (array, i, strcmp, HAS_CPU_FEATURE (SSSE3),
                      __strcmp_ssse3)
              IFUNC_IMPL_ADD (array, i, strcmp, 1, __strcmp_sse2_unaligned)
              IFUNC_IMPL_ADD (array, i, strcmp, 1, __strcmp_sse2))
    

    It looks like AVX2_Usable is the feature to disable. Let's rerun gdb accordingly:

    GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX2_Usable gdb...

    On this iteration it complained about __memmove_avx_unaligned_erms, which appeared to be enabled by AVX_Usable - but I found another path in ifunc-memmove.h enabled by AVX_Fast_Unaligned_Load. Back to the drawing board:

    GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX2_Usable,-AVX_Fast_Unaligned_Load gdb ...

    On this final round I discovered a rdtscp instruction in the ASAN shared library, so I recompiled without the address sanitizer and at last, it worked.

    In summary: with some work it's possible to disable these instructions from the command line and use gdb's record feature without severe hacks.

    0 讨论(0)
  • 2020-11-30 07:36

    I encountered this problem recently as well, and ended up solving it using dynamic CPUID faulting to interrupt execution of the CPUID instruction and override its result, which avoids touching glibc or the dynamic linker. This requires processor support for CPUID faulting (Ivy Bridge+) as well as Linux kernel support (4.12+) for exposing it to userspace through the ARCH_GET_CPUID and ARCH_SET_CPUID subfunctions of arch_prctl(). When this feature is enabled, a SIGSEGV signal will be delivered on each execution of CPUID, allowing a signal handler can emulate execution of the instruction and override the result.

    The full solution is a bit involved since I also need to interpose the dynamic linker, because hardware capability detection was moved there starting with glibc 2.26+. I've uploaded the full solution online at https://github.com/ddcc/libcpuidoverride .

    0 讨论(0)
提交回复
热议问题