intrinsic memcmp

后端 未结 3 384
一整个雨季
一整个雨季 2020-12-03 15:59

According to the gcc docs, memcmp is not an intrinsic function of GCC. If you wanted to speed up glibc\'s memcmp under gcc, you would need to use the lower level intrinsics

相关标签:
3条回答
  • 2020-12-03 16:38

    Note that the repz cmpsb routine might not be faster than glibc's memcmp. In my tests, in fact, it's never faster, even when comparing just a few bytes.

    See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052

    0 讨论(0)
  • 2020-12-03 16:49

    Your link appears to be for the x86 architecture-specific built-in functions, according to this memcmp is implemented as an architecture-independent built-in by gcc.

    Edit:

    Compiling the following code with Cygwin gcc version 3.3.1 for i686, -O2:

    #include <stdlib.h>
    
    struct foo {
        int a;
        int b;
    } ;
    
    int func(struct foo *x, struct foo *y)
    {
        return memcmp(x, y, sizeof (struct foo));
    }
    

    Produces the following output (note that the call to memcmp() is converted to an 8-byte "repz cmpsb"):

       0:   55                      push   %ebp
       1:   b9 08 00 00 00          mov    $0x8,%ecx
       6:   89 e5                   mov    %esp,%ebp
       8:   fc                      cld    
       9:   83 ec 08                sub    $0x8,%esp
       c:   89 34 24                mov    %esi,(%esp)
       f:   8b 75 08                mov    0x8(%ebp),%esi
      12:   89 7c 24 04             mov    %edi,0x4(%esp)
      16:   8b 7d 0c                mov    0xc(%ebp),%edi
      19:   f3 a6                   repz cmpsb %es:(%edi),%ds:(%esi)
      1b:   0f 92 c0                setb   %al
      1e:   8b 34 24                mov    (%esp),%esi
      21:   8b 7c 24 04             mov    0x4(%esp),%edi
      25:   0f 97 c2                seta   %dl
      28:   89 ec                   mov    %ebp,%esp
      2a:   5d                      pop    %ebp
      2b:   28 c2                   sub    %al,%dl
      2d:   0f be c2                movsbl %dl,%eax
      30:   c3                      ret    
      31:   90                      nop    
    
    0 讨论(0)
  • 2020-12-03 17:01

    Now in 2017, GCC and Clang seems to have some optimizations for buffers of sizes 1, 2, 4, 8 and some others, for example 3, 5 and multiple of 8.

    0 讨论(0)
提交回复
热议问题