Is it possible to write Quake's fast InvSqrt() function in Rust?

前端 未结 3 1396
太阳男子
太阳男子 2021-01-29 23:03

This is just to satisfy my own curiosity.

Is there an implementation of this:

float InvSqrt (float x)
{
   float xh         


        
相关标签:
3条回答
  • 2021-01-29 23:42

    This one is implemented with less known union in Rust:

    union FI {
        f: f32,
        i: i32,
    }
    
    fn inv_sqrt(x: f32) -> f32 {
        let mut u = FI { f: x };
        unsafe {
            u.i = 0x5f3759df - (u.i >> 1);
            u.f * (1.5 - 0.5 * x * u.f * u.f)
        }
    }
    

    Did some micro benchmarks using criterion crate on a x86-64 Linux box. Surprisingly Rust's own sqrt().recip() is the fastest. But of course, any micro benchmark result should be taken with a grain of salt.

    inv sqrt with transmute time:   [1.6605 ns 1.6638 ns 1.6679 ns]
    inv sqrt with union     time:   [1.6543 ns 1.6583 ns 1.6633 ns]
    inv sqrt with to and from bits
                            time:   [1.7659 ns 1.7677 ns 1.7697 ns]
    inv sqrt with powf      time:   [7.1037 ns 7.1125 ns 7.1223 ns]
    inv sqrt with sqrt then recip
                            time:   [1.5466 ns 1.5488 ns 1.5513 ns]
    
    0 讨论(0)
  • 2021-01-29 23:46

    I don't know how to encode the float number using integer format.

    There is a function for that: f32::to_bits which returns an u32. There is also the function for the other direction: f32::from_bits which takes an u32 as argument. These functions are preferred over mem::transmute as the latter is unsafe and tricky to use.

    With that, here is the implementation of InvSqrt:

    fn inv_sqrt(x: f32) -> f32 {
        let i = x.to_bits();
        let i = 0x5f3759df - (i >> 1);
        let y = f32::from_bits(i);
    
        y * (1.5 - 0.5 * x * y * y)
    }
    

    (Playground)


    This function compiles to the following assembly on x86-64:

    .LCPI0_0:
            .long   3204448256        ; f32 -0.5
    .LCPI0_1:
            .long   1069547520        ; f32  1.5
    example::inv_sqrt:
            movd    eax, xmm0
            shr     eax                   ; i << 1
            mov     ecx, 1597463007       ; 0x5f3759df
            sub     ecx, eax              ; 0x5f3759df - ...
            movd    xmm1, ecx
            mulss   xmm0, dword ptr [rip + .LCPI0_0]    ; x *= 0.5
            mulss   xmm0, xmm1                          ; x *= y
            mulss   xmm0, xmm1                          ; x *= y
            addss   xmm0, dword ptr [rip + .LCPI0_1]    ; x += 1.5
            mulss   xmm0, xmm1                          ; x *= y
            ret
    

    I have not found any reference assembly (if you have, please tell me!), but it seems fairly good to me. I am just not sure why the float was moved into eax just to do the shift and integer subtraction. Maybe SSE registers do not support those operations?

    clang 9.0 with -O3 compiles the C code to basically the same assembly. So that's a good sign.


    It is worth pointing out that if you actually want to use this in practice: please don't. As benrg pointed out in the comments, modern x86 CPUs have a specialized instruction for this function which is faster and more accurate than this hack. Unfortunately, 1.0 / x.sqrt() does not seem to optimize to that instruction. So if you really need the speed, using the _mm_rsqrt_ps intrinsics is probably the way to go. This, however, does again require unsafe code. I won't go into much detail in this answer, as a minority of programmers will actually need it.

    0 讨论(0)
  • 2021-01-30 00:04

    You may use std::mem::transmute to make needed conversion:

    fn inv_sqrt(x: f32) -> f32 {
        let xhalf = 0.5f32 * x;
        let mut i: i32 = unsafe { std::mem::transmute(x) };
        i = 0x5f3759df - (i >> 1);
        let mut res: f32 = unsafe { std::mem::transmute(i) };
        res = res * (1.5f32 - xhalf * res * res);
        res
    }
    

    You can look for a live example here: here

    0 讨论(0)
提交回复
热议问题