What's the most efficient way to make bitwise operations in a C array

后端 未结 3 1942
死守一世寂寞
死守一世寂寞 2021-02-08 13:24

I have a C array like:

char byte_array[10];

And another one that acts as a mask:

char byte_mask[10];

I would

相关标签:
3条回答
  • 2021-02-08 13:54
    \#define CHAR_ARRAY_SIZE    (10)
    \#define INT_ARRAY_SIZE     ((CHAR_ARRAY_SIZE/ (sizeof (unsigned int)) + 1)
    
    typedef union _arr_tag_ {
    
        char          byte_array [CHAR_ARRAY_SIZE];
        unsigned int  int_array [INT_ARRAY_SIZE]; 
    
    } arr_tag;
    

    Now int_array for masking. This might work for both 32bit and 64 bit processors.

    arr_tag arr_src, arr_result, arr_mask;
    
    for (int i = 0; i < INT_ARRAY_SIZE; i ++) {
        arr_result.int_array [i] = arr_src.int_array[i] & arr_mask.int_array [i];
    }
    

    Try this, code might also look clean.

    0 讨论(0)
  • 2021-02-08 14:08
    for ( i = 10 ; i-- > 0 ; )
        result_array[i] = byte_array[i] & byte_mask[i];
    
    • Going backwards pre-loads processor cache-lines.
    • Including the decrement in the compare can save some instructions.

    This will work for all arrays and processors. However, if you know your arrays are word-aligned, a faster method is to cast to a larger type and do the same calculation.

    For example, let's say n=16 instead of n=10. Then this would be much faster:

    uint32_t* input32 = (uint32_t*)byte_array;
    uint32_t* mask32 = (uint32_t*)byte_mask;
    uint32_t* result32 = (uint32_t*)result_array;
    for ( i = 4 ; i-- > 0 ; )
        result32[i] = input32[i] & mask32[i];
    

    (Of course you need a proper type for uint32_t, and if n is not a power of 2 you need to clean up the beginning and/or ending so that the 32-bit stuff is aligned.)

    Variation: The question specifically calls for the results to be placed in a separate array, however it would almost certainly be faster to modify the input array in-place.

    0 讨论(0)
  • 2021-02-08 14:15

    If you want to make it faster, make sure that byte_array has length that is multiple of 4 (8 on 64-bit machines), and then:

    char byte_array[12];
    char byte_mask[12];
    /* Checks for proper alignment */
    assert(((unsigned int)(void *)byte_array) & 3 == 0);
    assert(((unsigned int)(void *)byte_mask) & 3 == 0);
    for (i = 0; i < (10+3)/4; i++) {
      ((unsigned int *)(byte_array))[i] &= ((unsigned int *)(byte_mask))[i];
    }
    

    This is much faster than doing it byte per byte.

    (Note that this is in-place mutation; if you want to keep the original byte_array also, then you obviously need to store the results in another array instead.)

    0 讨论(0)
提交回复
热议问题