Speed up Matrix Addition in C#

前端 未结 15 1085
北荒
北荒 2021-02-05 22:48

I\'d like to optimize this piece of code :

public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{            
        for (int x = 0;         


        
相关标签:
15条回答
  • 2021-02-05 23:16

    Where are images stored? If each is on disk, then a bit of your processing time issue may be in fetching them from the disk. You might examine this to see if it is an issue, and if so, then rewrite to pre-fetch the image data so that the array procesing code does not have to wait for the data...

    If the overall application logic will allow it (Is each matrix addition independant, or dependant on output of a previous matrix addition?) If they are independant, I'd examine executing them all on separate threads, or in parallel..

    0 讨论(0)
  • 2021-02-05 23:19

    System.Drawing.Color is a structure, which on current versions of .NET kills most optimizations. Since you're only interested in the blue component anyway, use a method that only gets the data you need.

    public byte GetPixelBlue(int x, int y)
    {
        int offsetFromOrigin = (y * this.stride) + (x * 3);
        unsafe
        {
            return this.imagePtr[offsetFromOrigin];
        }
    }
    

    Now, exchange the order of iteration of x and y:

    public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
    {            
        for (int y = 0; y < Height; y++)
        {
            for (int x = 0; x < Width; x++)
            {
                Byte  pixelValue = image.GetPixelBlue(x, y);
                this.sumOfPixelValues[y, x] += pixelValue;
                this.sumOfPixelValuesSquared[y, x] += pixelValue * pixelValue;
            }
        }
    }
    

    Now you're accessing all values within a scan line sequentially, which will make much better use of CPU cache for all three matrices involved (image.imagePtr, sumOfPixelValues, and sumOfPixelValuesSquared. [Thanks to Jon for noticing that when I fixed access to image.imagePtr, I broke the other two. Now the output array indexing is swapped to keep it optimal.]

    Next, get rid of the member references. Another thread could theoretically be setting sumOfPixelValues to another array midway through, which does horrible horrible things to optimizations.

    public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
    {          
        uint [,] sums = this.sumOfPixelValues;
        ulong [,] squares = this.sumOfPixelValuesSquared;
        for (int y = 0; y < Height; y++)
        {
            for (int x = 0; x < Width; x++)
            {
                Byte  pixelValue = image.GetPixelBlue(x, y);
                sums[y, x] += pixelValue;
                squares[y, x] += pixelValue * pixelValue;
            }
        }
    }
    

    Now the compiler can generate optimal code for moving through the two output arrays, and after inlining and optimization, the inner loop can step through the image.imagePtr array with a stride of 3 instead of recalculating the offset all the time. Now an unsafe version for good measure, doing the optimizations that I think .NET ought to be smart enough to do but probably isn't:

    unsafe public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
    {          
        byte* scanline = image.imagePtr;
        fixed (uint* sums = &this.sumOfPixelValues[0,0])
        fixed (uint* squared = &this.sumOfPixelValuesSquared[0,0])
        for (int y = 0; y < Height; y++)
        {
            byte* blue = scanline;
            for (int x = 0; x < Width; x++)
            {
                byte pixelValue = *blue;
                *sums += pixelValue;
                *squares += pixelValue * pixelValue;
                blue += 3;
                sums++;
                squares++;
            }
            scanline += image.stride;
        }
    }
    
    0 讨论(0)
  • 2021-02-05 23:19

    The only possible way I can think of to speed it up would be to try do some of the additions in parallel, which with your size might be beneficial over the threading overhead.

    0 讨论(0)
提交回复
热议问题