Why does my code run slower when I remove bounds checks?

前端 未结 1 406
失恋的感觉
失恋的感觉 2021-01-03 20:42

I\'m writing a linear algebra library in Rust.

I have a function to get a reference to a matrix cell at a given row and column. This function starts with a pair of a

相关标签:
1条回答
  • 2021-01-03 21:37

    It's not a complete answer because I haven't tested my claims, but this might explain it. Either ways, the only way to know for sure is to generate the LLVM IR and the assembler output. If you need a manual for LLVM IR, you can find it here: http://llvm.org/docs/LangRef.html .

    Anyways, enough about that. Let's say you have this code:

    #[inline(always)]
    pub unsafe fn get_unchecked(&self, row: usize, col: usize) -> &T {
        self.data.get_unchecked(self.row_col_index(row, col))
    }
    

    The compiler here changes this into an indirect load, which will probably be optimized in a tight loop. It's interesting to note that each load has a possibility to go wrong: if your data isn't available, it'll trigger an out-of-bounds.

    In the case with the bounds check combined with the tight loop, LLVM does a little trick. Because the load is in a tight loop (a matrix multiplication) and because the result of the bounds check depends on the bounds of the loop, it will remove the bounds check from the loop and put it around the loop. In other words, the loop itself will remain exactly the same, but with an extra bounds check.

    In other words, the code is exactly the same, with some minor differences.

    So what changed? Two things:

    1. If we have the additional bounds check, there's no possibility anymore for an out-of-bounds load. This might trigger an optimization that wasn't possible before. Still, considering how these checks are usually implemented, this wouldn't be my guess.

    2. Another thing to consider is that the word 'unsafe' might trigger some behavior, like an additional condition, pin data or disable the GC, etc. I'm not sure about this exact behavior in Rust; the only way to find out these details is to look at the LLVM IR.

    0 讨论(0)
提交回复
热议问题