Getting GCC/Clang to use CMOV

本小妞迷上赌 提交于 2021-02-19 04:38:05

问题


I have a simple tagged union of values. The values can either be int64_ts or doubles. I am performing addition on the these unions with the caveat that if both arguments represent int64_t values then the result should also have an int64_t value.

Here is the code:

#include<stdint.h>
union Value {
  int64_t a;
  double b;
};

enum Type { DOUBLE, LONG };

// Value + type.
struct TaggedValue {
  Type type;
  Value value;
};

void add(const TaggedValue& arg1, const TaggedValue& arg2, TaggedValue* out) {
  const Type type1 = arg1.type;
  const Type type2 = arg2.type;
  // If both args are longs then write a long to the output.
  if (type1 == LONG && type2 == LONG) {
    out->value.a = arg1.value.a + arg2.value.a;
    out->type = LONG;
  } else {
    // Convert argument to a double and add it.
    double op1 = type1 == LONG ? (double)arg1.value.a : arg1.value.b; // Why isn't CMOV used?
    double op2 = type2 == LONG ? (double)arg2.value.a : arg2.value.b; // Why isn't CMOV used? 
    out->value.b = op1 + op2;
    out->type = DOUBLE;
  }
}

The output of gcc at -O2 is here: http://goo.gl/uTve18 Attached here in case the link doesn't work.

add(TaggedValue const&, TaggedValue const&, TaggedValue*):
    cmp DWORD PTR [rdi], 1
    sete    al
    cmp DWORD PTR [rsi], 1
    sete    cl
    je  .L17
    test    al, al
    jne .L18
.L4:
    test    cl, cl
    movsd   xmm1, QWORD PTR [rdi+8]
    jne .L19
.L6:
    movsd   xmm0, QWORD PTR [rsi+8]
    mov DWORD PTR [rdx], 0
    addsd   xmm0, xmm1
    movsd   QWORD PTR [rdx+8], xmm0
    ret
.L17:
    test    al, al
    je  .L4
    mov rax, QWORD PTR [rdi+8]
    add rax, QWORD PTR [rsi+8]
    mov DWORD PTR [rdx], 1
    mov QWORD PTR [rdx+8], rax
    ret
.L18:
    cvtsi2sd    xmm1, QWORD PTR [rdi+8]
    jmp .L6
.L19:
    cvtsi2sd    xmm0, QWORD PTR [rsi+8]
    addsd   xmm0, xmm1
    mov DWORD PTR [rdx], 0
    movsd   QWORD PTR [rdx+8], xmm0
    ret

It produced code with a lot of branches. I know that the input data is pretty random i.e it has a random combination of int64_ts and doubles. I'd like to have at least the conversion to a double done with an equivalent of a CMOV instruction. Is there any way I can coax gcc to produce that code? I'd ideally like to run some benchmark on real data to see how the code with a lot of branches does vs one with fewer branches but more expensive CMOV instructions. It might turn out that the code generated by default by GCC works better but I'd like to confirm that. I could inline the assembly myself but I'd prefer not to.

The interactive compiler link is a good way to check the assembly. Any suggestions?

来源:https://stackoverflow.com/questions/30333068/getting-gcc-clang-to-use-cmov

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!