@inbounds propagation rules in Julia

前端 未结 1 679
庸人自扰
庸人自扰 2021-02-07 18:35

I\'m looking for some clarification of the bounds checking rules in Julia. Is this meaning that if I put @inbounds at the beginning of the for loop,



        
相关标签:
1条回答
  • 2021-02-07 19:25

    When the manual speaks about @inbounds propagating through "one layer," it's specifically referring to function call boundaries. The fact that it's only able to affect functions that get inlined is a secondary requirement that makes this especially confusing and tough to test, so let's not worry about inlining until later.

    The @inbounds macro annotates function calls such that they're able to elide bounds checks. In fact, the macro will do this for all function calls in the expression that is passed to it, including any number of nested for loops, begin blocks, if statements, etc. And, of course, indexing and indexed assignment are simply "sugars" that lower to function calls, so it affects those the same way. All this makes sense; as the author of the code that's wrapped by @inbounds, you're able to see the macro and ensure that it's safe to do so.

    But the @inbounds macro tells Julia to do something funny. It changes the behavior of code that's written in a totally different place! For example when you annotate the call:

    julia> f() = @inbounds return getindex(4:5, 10);
           f()
    13
    

    The macro effectively reaches into the standard library and disables that @boundscheck block, allowing it to compute values outside of the range's valid region.

    This is a spooky action at a distance… and if it's not carefully constrained, it could end up removing bounds-checks from library code where it's not intended or fully safe to do so. That's why there's the "one-layer" restriction; we only want to remove bounds checks when authors are explicitly aware that it might occur and opt-in to the removal.

    Now, as a library author, there may be cases where you want to opt-in to allow @inbounds to propagate through to all functions you call within the method. That's where Base.@propagate_inbounds is used. Unlike @inbounds, which annotates function calls, @propagate_inbounds annotates method definitions to allow for the inbounds state that the method gets called with to propagate through to all function calls you make in the method's implementation. This is a bit tough to describe in the abstract, so let's look at a concrete example.

    An Example

    Let's create a toy custom vector that simply creates a shuffled view into the vector it wraps:

    julia> module M
               using Random
               struct ShuffledVector{A,T} <: AbstractVector{T}
                   data::A
                   shuffle::Vector{Int}
               end
               ShuffledVector(A::AbstractVector{T}) where {T} = ShuffledVector{typeof(A), T}(A, randperm(length(A)))
               Base.size(A::ShuffledVector) = size(A.data)
               Base.@inline function Base.getindex(A::ShuffledVector, i::Int)
                   A.data[A.shuffle[i]]
               end
           end
    

    This is pretty straight-forward — we wrap any vector type, create a random permutation, and then upon indexing we just index into the original array using the permutation. And we know that all accesses into the subparts of the array should be okay based upon the outer constructor… so even though we aren't checking bounds ourselves, we can rely upon the inner indexing expressions throwing errors if we index out of bounds.

    julia> s = M.ShuffledVector(1:4)
    4-element Main.M.ShuffledVector{UnitRange{Int64},Int64}:
     1
     3
     4
     2
    
    julia> s[5]
    ERROR: BoundsError: attempt to access 4-element Array{Int64,1} at index [5]
    Stacktrace:
     [1] getindex at ./array.jl:728 [inlined]
     [2] getindex(::Main.M.ShuffledVector{UnitRange{Int64},Int64}, ::Int64) at ./REPL[10]:10
     [3] top-level scope at REPL[15]:1
    

    Note how the bounds error is coming not from the indexing into the ShuffledVector, but rather from indexing into the permutation vector A.perm[5]. Now perhaps a user of our ShuffledVector wants its accesses to be faster, so they try turning off bounds-checking with @inbounds:

    julia> f(A, i) = @inbounds return A[i]
    f (generic function with 1 method)
    
    julia> f(s, 5)
    ERROR: BoundsError: attempt to access 4-element Array{Int64,1} at index [5]
    Stacktrace:
     [1] getindex at ./array.jl:728 [inlined]
     [2] getindex at ./REPL[10]:10 [inlined]
     [3] f(::Main.M.ShuffledVector{UnitRange{Int64},Int64}, ::Int64) at ./REPL[16]:1
     [4] top-level scope at REPL[17]:1
    

    But they're still getting bounds errors! This is because @inbounds annotation only tried to remove the @boundscheck blocks from the method we wrote above. It doesn't propagate through to the standard library to remove the bounds-checking from either the A.perm array nor the A.data range. That's quite a bit of overhead, even though they tried to remove bounds! So, we can instead write the above getindex method with a Base.@propagate_inbounds annotation which will allow for this method to "inherit" its caller's in-bounds state:

    julia> module M
               using Random
               struct ShuffledVector{A,T} <: AbstractVector{T}
                   data::A
                   shuffle::Vector{Int}
               end
               ShuffledVector(A::AbstractVector{T}) where {T} = ShuffledVector{typeof(A), T}(A, randperm(length(A)))
               Base.size(A::ShuffledVector) = size(A.data)
               Base.@propagate_inbounds function Base.getindex(A::ShuffledVector, i::Int)
                   A.data[A.shuffle[i]]
               end
           end
    WARNING: replacing module M.
    Main.M
    
    julia> s = M.ShuffledVector(1:4);
    
    julia> s[5]
    ERROR: BoundsError: attempt to access 4-element Array{Int64,1} at index [5]
    Stacktrace:
     [1] getindex at ./array.jl:728 [inlined]
     [2] getindex(::Main.M.ShuffledVector{UnitRange{Int64},Int64}, ::Int64) at ./REPL[20]:10
     [3] top-level scope at REPL[22]:1 
    
    julia> f(s, 5) # That @inbounds now affects the inner indexing calls, too!
    0
    

    You can verify that there are no branches with @code_llvm f(s, 5).

    But, really, in this case I think it'd be much better to write this getindex method implementation with a @boundscheck block of its own:

    @inline function Base.getindex(A::ShuffledVector, i::Int)
        @boundscheck checkbounds(A, i)
        @inbounds r = A.data[A.shuffle[i]]
        return r
    end
    

    It's a little more verbose, but now it'll actually throw the bounds error on the ShuffledVector type instead of leaking the implementation details in the error message.

    The effect of inlining

    You'll notice that I don't test @inbounds in the global scope above, and instead use these little helper functions. That's because bounds check removal only works when the method gets inlined and compiled. So simply trying to remove bounds at the global scope isn't going to work since it can't inline the function call into the interactive REPL:

    julia> @inbounds getindex(4:5, 10)
    ERROR: BoundsError: attempt to access 2-element UnitRange{Int64} at index [10]
    Stacktrace:
     [1] throw_boundserror(::UnitRange{Int64}, ::Int64) at ./abstractarray.jl:538
     [2] getindex(::UnitRange{Int64}, ::Int64) at ./range.jl:617
     [3] top-level scope at REPL[24]:1
    

    There's no compilation or inlining occurring here at global scope, so Julia is unable to remove these bounds. Similarly, Julia isn't able to inline methods when there's a type instability (like when accessing a non-constant global), so it can't remove these bounds checks, either:

    julia> r = 1:2;
    
    julia> g() = @inbounds return r[3]
    g (generic function with 1 method)
    
    julia> g()
    ERROR: BoundsError: attempt to access 2-element UnitRange{Int64} at index [3]
    Stacktrace:
     [1] throw_boundserror(::UnitRange{Int64}, ::Int64) at ./abstractarray.jl:538
     [2] getindex(::UnitRange{Int64}, ::Int64) at ./range.jl:617
     [3] g() at ./REPL[26]:1
     [4] top-level scope at REPL[27]:1
    

    In general, bounds-check removal should be the last optimization you make after ensuring everything else works, is well-tested, and follows the usual performance tips.

    0 讨论(0)
提交回复
热议问题