Fastest way to zero out a 2d array in C?

前端 未结 12 1286
后悔当初
后悔当初 2021-01-29 18:58

I want to repeatedly zero a large 2d array in C. This is what I do at the moment:

// Array of size n * m, where n may not equal m
for(j = 0; j < n; j++)
{
            


        
相关标签:
12条回答
  • 2021-01-29 18:59
    memset(array, 0, sizeof(array[0][0]) * m * n);
    

    Where m and n are the width and height of the two-dimensional array (in your example, you have a square two-dimensional array, so m == n).

    0 讨论(0)
  • 2021-01-29 19:00

    Use calloc instead of malloc . calloc will initiate all fields to 0.

    int *a = (int *)calloc(n,size of(int)) ;

    //all cells of a have been initialized to 0

    0 讨论(0)
  • 2021-01-29 19:06
    memset(array, 0, sizeof(int [n][n]));
    
    0 讨论(0)
  • 2021-01-29 19:08

    If array is truly an array, then you can "zero it out" with:

    memset(array, 0, sizeof array);
    

    But there are two points you should know:

    • this works only if array is really a "two-d array", i.e., was declared T array[M][N]; for some type T.
    • it works only in the scope where array was declared. If you pass it to a function, then the name array decays to a pointer, and sizeof will not give you the size of the array.

    Let's do an experiment:

    #include <stdio.h>
    
    void f(int (*arr)[5])
    {
        printf("f:    sizeof arr:       %zu\n", sizeof arr);
        printf("f:    sizeof arr[0]:    %zu\n", sizeof arr[0]);
        printf("f:    sizeof arr[0][0]: %zu\n", sizeof arr[0][0]);
    }
    
    int main(void)
    {
        int arr[10][5];
        printf("main: sizeof arr:       %zu\n", sizeof arr);
        printf("main: sizeof arr[0]:    %zu\n", sizeof arr[0]);
        printf("main: sizeof arr[0][0]: %zu\n\n", sizeof arr[0][0]);
        f(arr);
        return 0;
    }
    

    On my machine, the above prints:

    main: sizeof arr:       200
    main: sizeof arr[0]:    20
    main: sizeof arr[0][0]: 4
    
    f:    sizeof arr:       8
    f:    sizeof arr[0]:    20
    f:    sizeof arr[0][0]: 4
    

    Even though arr is an array, it decays to a pointer to its first element when passed to f(), and therefore the sizes printed in f() are "wrong". Also, in f() the size of arr[0] is the size of the array arr[0], which is an "array [5] of int". It is not the size of an int *, because the "decaying" only happens at the first level, and that is why we need to declare f() as taking a pointer to an array of the correct size.

    So, as I said, what you were doing originally will work only if the two conditions above are satisfied. If not, you will need to do what others have said:

    memset(array, 0, m*n*sizeof array[0][0]);
    

    Finally, memset() and the for loop you posted are not equivalent in the strict sense. There could be (and have been) compilers where "all bits zero" does not equal zero for certain types, such as pointers and floating-point values. I doubt that you need to worry about that though.

    0 讨论(0)
  • 2021-01-29 19:08

    This happens because sizeof(array) gives you the allocation size of the object pointed to by array. (array is just a pointer to the first row of your multidimensional array). However, you allocated j arrays of size i. Consequently, you need to multiply the size of one row, which is returned by sizeof(array) with the number of rows you allocated, e.g.:

    bzero(array, sizeof(array) * j);
    

    Also note that sizeof(array) will only work for statically allocated arrays. For a dynamically allocated array you would write

    size_t arrayByteSize = sizeof(int) * i * j; 
    int *array = malloc(array2dByteSite);
    bzero(array, arrayByteSize);
    
    0 讨论(0)
  • 2021-01-29 19:12

    If you are really, really obsessed with speed (and not so much with portability) I think the absolute fastest way to do this would be to use SIMD vector intrinsics. e.g. on Intel CPUs, you could use these SSE2 instructions:

    __m128i _mm_setzero_si128 ();                   // Create a quadword with a value of 0.
    void _mm_storeu_si128 (__m128i *p, __m128i a);  // Write a quadword to the specified address.
    

    Each store instruction will set four 32-bit ints to zero in one hit.

    p must be 16-byte aligned, but this restriction is also good for speed because it will help the cache. The other restriction is that p must point to an allocation size that is a multiple of 16-bytes, but this is cool too because it allows us to unroll the loop easily.

    Have this in a loop, and unroll the loop a few times, and you will have a crazy fast initialiser:

    // Assumes int is 32-bits.
    const int mr = roundUpToNearestMultiple(m, 4);      // This isn't the optimal modification of m and n, but done this way here for clarity.    
    const int nr = roundUpToNearestMultiple(n, 4);    
    
    int i = 0;
    int array[mr][nr] __attribute__ ((aligned (16)));   // GCC directive.
    __m128i* px = (__m128i*)array;
    const int incr = s >> 2;                            // Unroll it 4 times.
    const __m128i zero128 = _mm_setzero_si128();
    
    for(i = 0; i < s; i += incr)
    {
        _mm_storeu_si128(px++, zero128);
        _mm_storeu_si128(px++, zero128);
        _mm_storeu_si128(px++, zero128);
        _mm_storeu_si128(px++, zero128);
    }
    

    There is also a variant of _mm_storeu that bypasses the cache (i.e. zeroing the array won't pollute the cache) which could give you some secondary performance benefits in some circumstances.

    See here for SSE2 reference: http://msdn.microsoft.com/en-us/library/kcwz153a(v=vs.80).aspx

    0 讨论(0)
提交回复
热议问题