What is the difference between `UnitRange` and `Array`?

前端 未结 1 1565
遇见更好的自我
遇见更好的自我 2021-01-12 12:17

I have two versions of code that seem to do the same thing:

sum = 0
for x in 1:100
    sum += x
end
sum = 0
for x in collect(1:100)         


        
相关标签:
1条回答
  • 2021-01-12 12:49

    In Julia, 1:100 returns a particular struct called UnitRange that looks like this:

    julia> dump(1:100)
    UnitRange{Int64}
      start: Int64 1
      stop: Int64 100
    

    This is a very compact struct to represent ranges with step 1 and arbitrary (finite) size. UnitRange is subtype of AbstractRange, a type to represent ranges with arbitrary step, subtype of AbstractVector.

    The instances of UnitRange dynamically compute their elements whenever the you use getindex (or the syntactic sugar vector[index]). For example, with @less (1:100)[3] you can see this method:

    function getindex(v::UnitRange{T}, i::Integer) where {T<:OverflowSafe}
        @_inline_meta
        val = v.start + (i - 1)
        @boundscheck _in_unit_range(v, val, i) || throw_boundserror(v, i)
        val % T
    end
    

    This is returning the i-th element of the vector by adding i - 1 to the first element (start) of the range. Some functions have optimised methods with UnitRange, or more generally with AbstractRange. For instance, with @less sum(1:100) you can see the following

    function sum(r::AbstractRange{<:Real})
        l = length(r)
        # note that a little care is required to avoid overflow in l*(l-1)/2
        return l * first(r) + (iseven(l) ? (step(r) * (l-1)) * (l>>1)
                                         : (step(r) * l) * ((l-1)>>1))
    end
    

    This method uses the formula for the sum of an arithmetic progression, which is extremely efficient as it's evaluated in a time independent of the size of the vector.

    On the other hand, collect(1:100) returns a plain Vector with one hundred elements 1, 2, 3, ..., 100. The main difference with UnitRange (or other types of AbstractRange) is that getindex(vector::Vector, i) (or vector[i], with vector::Vector) doesn't do any computation but simply accesses the i-th element of the vector. The downside of a Vector over a UnitRange is that generally speaking there aren't efficient methods when working with them as the elements of this container are completely arbitrary, while UnitRange represents a set of numbers with peculiar properties (sorted, constant step, etc...).

    If you compare the performance of methods for which UnitRange has super-efficient implementations, this type will win hands down (note the use of interpolation of variables with $(...) when using macros from BenchmarkTools):

    julia> using BenchmarkTools
    
    julia> @btime sum($(1:1000_000))
      0.012 ns (0 allocations: 0 bytes)
    500000500000
    
    julia> @btime sum($(collect(1:1000_000)))
      229.979 μs (0 allocations: 0 bytes)
    500000500000
    

    Remember that UnitRange comes with the cost of dynamically computing the elements every time you access them with getindex. Consider for example this function:

    function test(vec)
        sum = zero(eltype(vec))
        for idx in eachindex(vec)
            sum += vec[idx]
        end
        return sum
    end
    

    Let's benchmark it with a UnitRange and a plain Vector:

    julia> @btime test($(1:1000_000))
      812.673 μs (0 allocations: 0 bytes)
    500000500000
    
    julia> @btime test($(collect(1:1000_000)))
      522.828 μs (0 allocations: 0 bytes)
    500000500000
    

    In this case the function calling the plain array is faster than the one with a UnitRange because it doesn't have to dynamically compute 1 million elements.

    Of course, in these toy examples it'd be more sensible to iterate over all elements of vec rather than its indices, but in real world cases a situation like these may be more sensible. This last example, however, shows that a UnitRange is not necessarily more efficient than a plain array, especially if you need to dynamically compute all of its elements. UnitRanges are more efficient when you can take advantage of specialised methods (like sum) for which the operation can be performed in constant time.

    As a file remark, note that if you originally have a UnitRange it's not necessarily a good idea to convert it to a plain Vector to get good performance, especially if you're going to use it only once or very few times, as the conversion to Vector involves itself the dynamic computation of all elements of the range and the allocation of the necessary memory:

    julia> @btime collect($(1:1000_000));
      422.435 μs (2 allocations: 7.63 MiB)
    
    julia> @btime test(collect($(1:1000_000)))
      882.866 μs (2 allocations: 7.63 MiB)
    500000500000
    
    0 讨论(0)
提交回复
热议问题