问题
I have a simple tagged union of values. The values can either be int64_ts
or doubles
. I am performing addition on the these unions with the caveat that if both arguments represent int64_t
values then the result should also have an int64_t
value.
Here is the code:
#include<stdint.h>
union Value {
int64_t a;
double b;
};
enum Type { DOUBLE, LONG };
// Value + type.
struct TaggedValue {
Type type;
Value value;
};
void add(const TaggedValue& arg1, const TaggedValue& arg2, TaggedValue* out) {
const Type type1 = arg1.type;
const Type type2 = arg2.type;
// If both args are longs then write a long to the output.
if (type1 == LONG && type2 == LONG) {
out->value.a = arg1.value.a + arg2.value.a;
out->type = LONG;
} else {
// Convert argument to a double and add it.
double op1 = type1 == LONG ? (double)arg1.value.a : arg1.value.b; // Why isn't CMOV used?
double op2 = type2 == LONG ? (double)arg2.value.a : arg2.value.b; // Why isn't CMOV used?
out->value.b = op1 + op2;
out->type = DOUBLE;
}
}
The output of gcc at -O2 is here: http://goo.gl/uTve18 Attached here in case the link doesn't work.
add(TaggedValue const&, TaggedValue const&, TaggedValue*):
cmp DWORD PTR [rdi], 1
sete al
cmp DWORD PTR [rsi], 1
sete cl
je .L17
test al, al
jne .L18
.L4:
test cl, cl
movsd xmm1, QWORD PTR [rdi+8]
jne .L19
.L6:
movsd xmm0, QWORD PTR [rsi+8]
mov DWORD PTR [rdx], 0
addsd xmm0, xmm1
movsd QWORD PTR [rdx+8], xmm0
ret
.L17:
test al, al
je .L4
mov rax, QWORD PTR [rdi+8]
add rax, QWORD PTR [rsi+8]
mov DWORD PTR [rdx], 1
mov QWORD PTR [rdx+8], rax
ret
.L18:
cvtsi2sd xmm1, QWORD PTR [rdi+8]
jmp .L6
.L19:
cvtsi2sd xmm0, QWORD PTR [rsi+8]
addsd xmm0, xmm1
mov DWORD PTR [rdx], 0
movsd QWORD PTR [rdx+8], xmm0
ret
It produced code with a lot of branches. I know that the input data is pretty random i.e it has a random combination of int64_t
s and double
s. I'd like to have at least the conversion to a double done with an equivalent of a CMOV
instruction. Is there any way I can coax gcc to produce that code? I'd ideally like to run some benchmark on real data to see how the code with a lot of branches does vs one with fewer branches but more expensive CMOV
instructions. It might turn out that the code generated by default by GCC works better but I'd like to confirm that. I could inline the assembly myself but I'd prefer not to.
The interactive compiler link is a good way to check the assembly. Any suggestions?
来源:https://stackoverflow.com/questions/30333068/getting-gcc-clang-to-use-cmov