Let\'s say I have an array
k = [1 2 0 0 5 4 0]
I can compute a mask as follows
m = k > 0 = [1 1 0 0 1 1 0]
Using only the mask m and
Reading the comments below the original question, in the actual problem the array contains 32-bit floating point numbers, and the mask is (one?) 32-bit integer, so I don't get it why shifts etc. should be used for compacting the array. The simple compacting algorithm (in C) would be something like this:
float array[8];
unsigned int mask = ...;
int a = 0, b = 0;
while (mask) {
if (mask & 1) { array[a++] = array[b]; }
b++;
mask >>= 1;
}
/* Size of compacted array is 'a' */
/* Optionally clear the rest: */
while (a < 8) array[a++] = 0.0;
Minor variations would be due to the bit order of the mask, but the only ALU operations that are needed are index variable updates and shifting and ANDing the mask. Because the original array is at least 256 bits wide, no usual CPU can shift the whole array bit-wise around.