问题
I have a large array A of size [0, 8388608] of "relatively small" integers A[i] = [0, 131072] and I want to find the most frequently occurring element of every N=32 elements.
What would be faster,
A. Create an associative array B of size 131072, iterate through 32 elements, increment B[A[i]], then iterate through B, find the largest value, reset all elements in B to 0, repeat |A|/32 times.
B. qsort every 32 elements, find the largest range where A[i] == A[i-1] (and thus the most frequent element), repeat |A|/32 times.
(EDIT) C. Something else.
回答1:
An improvement over the first approach is possible. There is no need to iterate through B. And it can be an array of size 131072
Every time you increment B[A[i]]
, look at the new value in that cell. Then, have a global highest_frequency_found_far
. This start at zero, but after every increment the new value should be compared with this global. If it's higher, then the global is replaced.
You could also have a global value_that_was_associated_with_the_highest_count
for each block of 32 members of A ... {
size_t B [131072] = {0,0,...};
size_t highest_frequency_found_so_far = 0;
int value_associated_with_that = 0;
for(a : A) { // where A just means the current 32-element sub-block
const int new_frequency = ++B[a];
if (new_frequency > highest_frequency_found_so_far) {
highest_frequency_found_so_far = new_frequency;
value_associated_with_that = a;
}
}
// now, 'value_associated_with_that' is the most frequent element
// Thanks to @AkiSuihkonen for pointing out a really simple way to reset B each time.
// B is big, instead of zeroing each element explicitly, just do this loop to undo
// the ++B[a] from earlier:
for(a : A) { --B[a]; }
}
回答2:
what about a btree? You only need a max of 32 nodes and can declare them up front.
来源:https://stackoverflow.com/questions/19934468/most-frequent-of-every-n-elements-in-c