System.Numerics.Vector<T> Initialization Performance on .NET Framework

不羁岁月 提交于 2020-12-13 03:16:49

问题


System.Numerics.Vector brings SIMD support to .NET Core and .NET Framework. It works on .NET Framework 4.6+ and .NET Core.

// Baseline
public void SimpleSumArray() 
{
    for (int i = 0; i < left.Length; i++)
        results[i] = left[i] + right[i];
}

// Using Vector<T> for SIMD support
public void SimpleSumVectors() 
{
    int ceiling = left.Length / floatSlots * floatSlots;
    
    for (int i = 0; i < ceiling; i += floatSlots)
    {
        Vector<float> v1 = new Vector<float>(left, i);
        Vector<float> v2 = new Vector<float>(right, i);
        (v1 + v2).CopyTo(results, i);
    }
    for (int i = ceiling; i < left.Length; i++)
    {
        results[i] = left[i] + right[i];
    }
}

Unfortunately, the initialization of the Vector can be the limiting step. To work around this, several sources recommend using MemoryMarshal to transform the source array into an array of Vectors [1][2]. For example:

// Improving Vector<T> Initialization Performance
public void SimpleSumVectorsNoCopy() 
{
    int numVectors = left.Length / floatSlots;
    int ceiling = numVectors * floatSlots;
    // leftMemory is simply a ReadOnlyMemory<float> referring to the "left" array
    ReadOnlySpan<Vector<float>> leftVecArray = MemoryMarshal.Cast<float, Vector<float>>(leftMemory.Span);
    ReadOnlySpan<Vector<float>> rightVecArray = MemoryMarshal.Cast<float, Vector<float>>(rightMemory.Span);
    Span<Vector<float>> resultsVecArray = MemoryMarshal.Cast<float, Vector<float>>(resultsMemory.Span);
    for (int i = 0; i < numVectors; i++)
        resultsVecArray[i] = leftVecArray[i] + rightVecArray[i];
}

This brings a dramatic improvement in performance when running on .NET Core:

|                 Method |      Mean |     Error |    StdDev |
|----------------------- |----------:|----------:|----------:|
|         SimpleSumArray | 165.90 us | 0.1393 us | 0.1303 us |
|       SimpleSumVectors |  53.69 us | 0.0473 us | 0.0443 us |
| SimpleSumVectorsNoCopy |  31.65 us | 0.1242 us | 0.1162 us |

Unfortunately, on .NET Framework, this way of initializing the vector has the opposite effect. It actually leads to worse performance:

|                 Method |      Mean |    Error |   StdDev |
|----------------------- |----------:|---------:|---------:|
|         SimpleSumArray | 152.92 us | 0.128 us | 0.114 us |
|       SimpleSumVectors |  52.35 us | 0.041 us | 0.038 us |
| SimpleSumVectorsNoCopy |  77.50 us | 0.089 us | 0.084 us |

Is there a way to optimize the initialization of Vector on .NET Framework and get similar performance to .NET Core? Measurements have been performed using this sample application [1].

[1] https://github.com/CBGonzalez/SIMDPerformance

[2] https://stackoverflow.com/a/62702334/430935


回答1:


As far as I know, the only efficient way to load a vector in .NET Framework 4.6 or 4.7 (presumably this will all change in 5.0) is with unsafe code, for example using Unsafe.Read<Vector<float>> (or its unaliged variant if applicable):

public unsafe void SimpleSumVectors()
{
    int ceiling = left.Length / floatSlots * floatSlots;

    fixed (float* leftp = left, rightp = right, resultsp = results)
    {
        for (int i = 0; i < ceiling; i += floatSlots)
        {
            Unsafe.Write(resultsp + i, 
                Unsafe.Read<Vector<float>>(leftp + i) + Unsafe.Read<Vector<float>>(rightp + i));
        }
    }
    for (int i = ceiling; i < left.Length; i++)
    {
        results[i] = left[i] + right[i];
    }
}

This uses the System.Runtime.CompilerServices.Unsafe package which you can get via NuGet, but it could be done without that too.



来源:https://stackoverflow.com/questions/64729099/system-numerics-vectort-initialization-performance-on-net-framework

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!