Fastest way to convert T[,] to T[][]?

点点圈 提交于 2019-12-01 00:33:14

问题


So it turns out all arrays are not created equal. Multi-dimensional arrays can have non-zero lower bounds. See for example Excel PIA's Range.Value property object[,] rectData = myRange.Value;

I need to convert these data into a jagged array. My first try below smells of complexity. Any suggestions to optimize it? It needs to handle the general case where lower bounds may not be zero.

I have this ex method:

    public static T[][] AsJagged<T>( this T[,] rect )
    {
        int row1 = rect.GetLowerBound(0);
        int rowN = rect.GetUpperBound(0);
        int col1 = rect.GetLowerBound(1);
        int colN = rect.GetUpperBound(1);

        int height = rowN - row1 + 1;
        int width = colN - col1 + 1;
        T[][] jagged = new T[height][];

        int k = 0;
        int l;
        for ( int i = row1; i < row1 + height; i++ )
        {
            l = 0;
            T[] temp = new T[width];
            for ( int j = col1; j < col1 + width; j++ )
                temp[l++] = rect[i, j];
            jagged[k++] = temp;
        }

        return jagged;
    }

Used like this:

    public void Foo()
    {
        int[,] iRect1 = { { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 }, { 1, 1, 1, 1 } };
        int[][] iJagged1 = iRect1.AsJagged();

        int[] lengths = { 3, 5 };
        int[] lowerBounds = { 7, 8 };
        int[,] iRect2 = (int[,])Array.CreateInstance(typeof(int), lengths, lowerBounds);
        int[][] iJagged2 = iRect2.AsJagged();

    }

Curious if Buffer.BlockCopy() would work or be faster?

Edit: AsJagged needs to handle reference types.

Edit: Found bug in AsJagged(). Added int l; and added col1 + width to inner loop.


回答1:


A view caveats/assumptions up front:

  • You seem to use only int as your data type (or at least seem to be OK with using Buffer.BlockCopy which would imply you can life with primitive types in general).
  • For the test data you show, I don't think there will be much different using any somewhat sane approach.

Having that said, the following implementation (which needs to be specialized for a specific primitive type (here int) because it uses fixed) is around 10 times faster than the approach using the inner loop:

    unsafe public static int[][] AsJagged2(int[,] rect)
    {
        int row1 = rect.GetLowerBound(0);
        int rowN = rect.GetUpperBound(0);
        int col1 = rect.GetLowerBound(1);
        int colN = rect.GetUpperBound(1);

        int height = rowN - row1 + 1;
        int width = colN - col1 + 1;
        int[][] jagged = new int[height][];

        int k = 0;
        for (int i = row1; i < row1 + height; i++)
        {
            int[] temp = new int[width];

            fixed (int *dest = temp, src = &rect[i, col1])
            {
                MoveMemory(dest, src, rowN * sizeof(int));
            }

            jagged[k++] = temp;
        }

        return jagged;
    }

    [DllImport("kernel32.dll", EntryPoint = "RtlMoveMemory")]
    unsafe internal static extern void MoveMemory(void* dest, void* src, int length);

Using the following "test code":

    static void Main(string[] args)
    {
        Random rand = new Random();
        int[,] data = new int[100,1000];
        for (int i = 0; i < data.GetLength(0); i++)
        {
            for (int j = 0; j < data.GetLength(1); j++)
            {
                data[i, j] = rand.Next(0, 1000);
            }
        }

        Stopwatch sw = Stopwatch.StartNew();

        for (int i = 0; i < 100; i++)
        {
            int[][] dataJagged = AsJagged(data);
        }

        Console.WriteLine("AsJagged:  " + sw.Elapsed);

        sw = Stopwatch.StartNew();

        for (int i = 0; i < 100; i++)
        {
            int[][] dataJagged2 = AsJagged2(data);
        }

        Console.WriteLine("AsJagged2: " + sw.Elapsed);
    }

Where AsJagged (the first case) is your original function, I get the following output:

AsJagged:  00:00:00.9504376
AsJagged2: 00:00:00.0860492

So there is indeed a faster way of doing it, however depending on the size of the test data, the number of times you actually perform this operation, and your willingness to allow unsafe and P/Invoke code, you're probably not going to need it.

Having that said, we were using large matrixes of double (say 7000x10000 elements) where it indeed did make a huge difference.

Update: about using Buffer.BlockCopy

I might overlook some Marshal or other trick, but I don't think using Buffer.BlockCopy is possible here. This is due to the fact that it requires both the source and destination array to, well, be an Array.

In our example, the destination is an array (e.g. int[] temp = ...) however the source is not. While we "know" that for two dimensional arrays of primitive types the layout is such, that each "row" (i.e. first dimension) is an array of the type in memory, there is no safe (as in unsafe) way to get that array without the overhead of copying it first. So we basically need to use a function that simply deals with memory and doesn't care about the actual content of it - like MoveMemory. BTW, the internal implementation of Buffer.BlockCopy does something similar.




回答2:


Your complexity is O(N*M) N - number of rows, M - number of columns. That's the best you can get when copying N*M values...

Buffer.BlockCopy might be faster than your inner for loop, but I wouldn't be surprised if the compiler knows how to handle this code properly and you won't gain any further speed. You should test it to make sure.

You may be able to achieve better performance by not copying the data at all (at the potential expense of slightly slower lookups). If you create an 'array row' class, that holds your rect and a row number, and provides an indexer that accesses the correct column, you can create an array of such rows, and save yourself the copying altogether.

The complexity of creating such an array of 'array rows' is O(N).

EDIT: An ArrayRow class, just because it bugs me...

The ArrayRow could look something like this:

class ArrayRow<T>
{
    private T[,] _source;
    private int _row;

    public ArrayRow(T[,] rect, int row)
    {
         _source = rect;
         _row = row;
    }

    public T this[int col] { get { return _source[_row, col]; } }
}

Now you create an array of ArrayRows, you don't copy anything at all, and the optimizer has a good chance of optimizing accessing an entire row in sequence.



来源:https://stackoverflow.com/questions/9357092/fastest-way-to-convert-t-to-t

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!