Quickly find whether a value is present in a C array?

前端 未结 15 1223
灰色年华
灰色年华 2021-01-29 17:30

I have an embedded application with a time-critical ISR that needs to iterate through an array of size 256 (preferably 1024, but 256 is the minimum) and check if a value matches

相关标签:
15条回答
  • 2021-01-29 18:13

    Keep the table in sorted order, and use Bentley's unrolled binary search:

    i = 0;
    if (key >= a[i+512]) i += 512;
    if (key >= a[i+256]) i += 256;
    if (key >= a[i+128]) i += 128;
    if (key >= a[i+ 64]) i +=  64;
    if (key >= a[i+ 32]) i +=  32;
    if (key >= a[i+ 16]) i +=  16;
    if (key >= a[i+  8]) i +=   8;
    if (key >= a[i+  4]) i +=   4;
    if (key >= a[i+  2]) i +=   2;
    if (key >= a[i+  1]) i +=   1;
    return (key == a[i]);
    

    The point is,

    • if you know how big the table is, then you know how many iterations there will be, so you can fully unroll it.
    • Then, there's no point testing for the == case on each iteration because, except on the last iteration, the probability of that case is too low to justify spending time testing for it.**
    • Finally, by expanding the table to a power of 2, you add at most one comparison, and at most a factor of two storage.

    ** If you're not used to thinking in terms of probabilities, every decision point has an entropy, which is the average information you learn by executing it. For the >= tests, the probability of each branch is about 0.5, and -log2(0.5) is 1, so that means if you take one branch you learn 1 bit, and if you take the other branch you learn one bit, and the average is just the sum of what you learn on each branch times the probability of that branch. So 1*0.5 + 1*0.5 = 1, so the entropy of the >= test is 1. Since you have 10 bits to learn, it takes 10 branches. That's why it's fast!

    On the other hand, what if your first test is if (key == a[i+512)? The probability of being true is 1/1024, while the probability of false is 1023/1024. So if it's true you learn all 10 bits! But if it's false you learn -log2(1023/1024) = .00141 bits, practically nothing! So the average amount you learn from that test is 10/1024 + .00141*1023/1024 = .0098 + .00141 = .0112 bits. About one hundredth of a bit. That test is not carrying its weight!

    0 讨论(0)
  • 2021-01-29 18:13

    I'm a great fan of hashing. The problem of course is to find an efficient algorithm that is both fast and uses a minimum amount of memory (especially on an embedded processor).

    If you know beforehand the values that may occur you can create a program that runs through a multitude of algorithms to find the best one - or, rather, the best parameters for your data.

    I created such a program that you can read about in this post and achieved some very fast results. 16000 entries translates roughly to 2^14 or an average of 14 comparisons to find the value using a binary search. I explicitly aimed for very fast lookups - on average finding the value in <=1.5 lookups - which resulted in greater RAM requirements. I believe that with a more conservative average value (say <=3) a lot of memory could be saved. By comparison the average case for a binary search on your 256 or 1024 entries would result in an average number of comparisons of 8 and 10, respectively.

    My average lookup required around 60 cycles (on a laptop with an intel i5) with a generic algorithm (utilizing one division by a variable) and 40-45 cycles with a specialized (probably utilizing a multiplication). This should translate into sub-microsecond lookup times on your MCU, depending of course on the clock frequency it executes at.

    It can be real-life-tweaked further if the entry array keeps track of how many times an entry was accessed. If the entry array is sorted from most to least accessed before the indeces are computed then it'll find the most commonly occuring values with a single comparison.

    0 讨论(0)
  • 2021-01-29 18:15

    Other people have suggested reorganizing your table, adding a sentinel value at the end, or sorting it in order to provide a binary search.

    You state "I also use pointer arithmetic and a for loop, which does down-counting instead of up (checking if i != 0 is faster than checking if i < 256)."

    My first advice is: get rid of the pointer arithmetic and the downcounting. Stuff like

    for (i=0; i<256; i++)
    {
        if (compareVal == the_array[i])
        {
           [...]
        }
    }
    

    tends to be idiomatic to the compiler. The loop is idiomatic, and the indexing of an array over a loop variable is idiomatic. Juggling with pointer arithmetic and pointers will tend to obfuscate the idioms to the compiler and make it generate code related to what you wrote rather than what the compiler writer decided to be the best course for the general task.

    For example, the above code might be compiled into a loop running from -256 or -255 to zero, indexing off &the_array[256]. Possibly stuff that is not even expressible in valid C but matches the architecture of the machine you are generating for.

    So don't microoptimize. You are just throwing spanners into the works of your optimizer. If you want to be clever, work on the data structures and algorithms but don't microoptimize their expression. It will just come back to bite you, if not on the current compiler/architecture, then on the next.

    In particular using pointer arithmetic instead of arrays and indexes is poison for the compiler being fully aware of alignments, storage locations, aliasing considerations and other stuff, and for doing optimizations like strength reduction in the way best suited to the machine architecture.

    0 讨论(0)
提交回复
热议问题