I find myself typing
double foo=1.0/sqrt(...);
a lot, and I\'ve heard that modern processors have built-in inverse square root opcodes.
Violating constraints 1. and 2. (and it's also not standard), but it still might help someone browsing through...
I used ASMJIT to just-in-time compile the exact assembly operation you're looking for: RSQRTSS
(single precision, ok, but it should be similar with double).
My code is this (cf. also my answer in a different post):
typedef float(*JITFunc)();
JITFunc func;
asmjit::JitRuntime jit_runtime;
asmjit::CodeHolder code;
code.init(jit_runtime.getCodeInfo());
asmjit::X86Compiler cc(&code);
cc.addFunc(asmjit::FuncSignature0());
float value = 2.71; // Some example value.
asmjit::X86Xmm x = cc.newXmm();
uint32_t *i = reinterpret_cast(&value);
cc.mov(asmjit::x86::eax, i[0]);
cc.movd(x, asmjit::x86::eax);
cc.rsqrtss(x, x); // THE asm function.
cc.ret(x);
cc.endFunc();
cc.finalize();
jit_runtime.add(&func, &code);
// Now, func() can be used as the result to rsqrt(value).
If you do the JIT compilation part only once, calling it later with different values, this should be faster (though slightly less accurate, but this is inherent to the built-in operations you're talking about) than 1.0/sqrt(...)
.