What is the most efficient for speed algorithm to solve the following problem?
Given 6 arrays, D1,D2,D3,D4,D5 and D6 each containing 6 numbers like:
D1[0
In case the range of the numbers is limited, it would be probably easier to make a bit array, like this:
int IsPresent(int arrays[][6], int ans[6], int ST1)
{
uint32_t bit_mask = 0;
for(int i = 0; i < 6; ++ i) {
for(int j = 0; j < ST1; ++ j) {
assert(arrays[i][j] >= 0 && arrays[i][j] < 32); // range is limited
bit_mask |= 1 << arrays[i][j];
}
}
// make a "list" of numbers that we have
for(int i = 0; i < 6; ++ i) {
if(((bit_mask >> ans[i]) & 1) == 0)
return 0; // in ans, there is a number that is not present in arrays
}
return 1; // all of the numbers were found
}
This will always run in O(6 * ST1 + 6). Now this has the disadvantage of first going through up to 36 arrays and then checking against six values. If there is a strong precondition that the numbers will be mostly present, it is possible to reverse the test and provide an early exit:
int IsPresent(int arrays[][6], int ans[6], int ST1)
{
uint32_t bit_mask = 0;
for(int i = 0; i < 6; ++ i) {
assert(ans[i][j] >= 0 && ans[i][j] < 32); // range is limited
bit_mask |= 1 << ans[i];
}
// make a "list" of numbers that we need to find
for(int i = 0; i < 6; ++ i) {
for(int j = 0; j < ST1; ++ j)
bit_mask &= ~(1 << arrays[i][j]); // clear bits of the mask
if(!bit_mask) // check if we have them all
return 1; // all of the numbers were found
}
assert(bit_mask != 0);
return 0; // there are some numbers remaining yet to be found
}
This will run at most in O(6 * ST1 + 6), at best in O(6 + 1) if the first number in the first array covers all of ans
(and ans
is six times the same number). Note that the test for bit mask being zero can be either after each array (as it is now) or after each element (that way involves more checking but also earlier cutoff when all the numbers are found). In context of CUDA, the first version of the algorithm would likely be faster, as it involves fewer branches and most of the loops (except the one for ST1) can be automatically unrolled.
However, if the range of the numbers is unlimited, we could do something else. Since there are only up to 7 * 6 = 42 different numbers in ans and all the arrays, it would be possible to map those to 42 different numbers and use a 64-bit integer for a bit mask. But arguably this mapping of numbers to integers would already be enough for the test and it would be possible to skip this test altogether.
Another way to do it would be to sort the arrays and simply count coverage of the individual numbers:
int IsPresent(int arrays[][6], int ans[6], int ST1)
{
int all_numbers[36], n = ST1 * 6;
for(int i = 0; i < 6; ++ i)
memcpy(&all_numbers[i * ST1], &arrays[i], ST1 * sizeof(int));
// copy all of the numbers into a contiguous array
std::sort(all_numbers, all_numbers + n);
// or use "C" standard library qsort() or a bitonic sorting network on GPU
// alternatively, sort each array of 6 separately and then merge the sorted
// arrays (can also be done in parallel, to some level)
n = std::unique(all_numbers, all_numbers + n) - all_numbers;
// this way, we can also remove duplicate numbers, if they are
// expect to occur frequently and make the next test faster.
// std::unique() actually moves the duplicates to the end of the list
// and returns an iterator (a pointer in this case) to one past
// the last unique element of the list - that gives us number of
// unique items.
for(int i = 0; i < 6; ++ i) {
int *p = std::lower_bound(all_numbers, all_numbers + n, ans[i]);
// use binary search to find the number in question
// or use "C" standard library bfind()
// or implement binary search yourself on GPU
if(p == all_numbers + n)
return 0; // not found
// alternately, make all_numbers array of 37 and write
// all_numbers[n] = -1; before this loop. that will act
// as a sentinel and will save this one comparison (assuming
// that there is a value that is guaranteed not to occur in ans)
if(*p != ans[i])
return 0; // another number found, not ans[i]
// std::lower_bound looks for the given number, or for one that
// is greater than it, so if the number was to be inserted there
// (before the bigger one), the sequence would remain ordered.
}
return 1; // all the numbers were found
}
This will run in O(n) for copying, O(36 log 36) for sorting, optionally O(n) for unique
(where n is 6 * ST1) and O(n log n) for searching (where n can be less than 6 * ST1 if unique
is employed). The whole algorithm therefore runs in linearithmic time. Note that this does not involve any dynamic memory allocation and as such is suitable even for GPU platforms (one would have to implement sorting and port std::unique()
and std::lower_bound()
, but all those are farily simple functions).