Faster than binary search for ordered list

后端 未结 11 1720
自闭症患者
自闭症患者 2020-12-12 13:51

is there an algorithm that is faster than binary search, for searching in sorted values of array?

in my case, I have a sorted values (could be any type values) in an

相关标签:
11条回答
  • 2020-12-12 14:17

    Yes and no. Yes there are searches that are faster, on average, than a bisection search. But I believe that they are still O(lg N), just with a lower constant.

    You want to minimize the time taken to find your element. Generally it is desirable to use fewer steps, and one way to approach this is to maximize the expected number of elements that will be eliminated at each step. With bisection, always exactly half the elements are eliminated. You can do better than this, IF you know something about the distribution of the elements. But, the algorithm for choosing the partition element is generally more complicated than choosing the midpoint, and this extra complexity may overwhelm any time savings you expected to get from using fewer steps.

    Really, in a problem like this it's better to attack second-order effects like cache locality, than the search algorithm. For example, when doing a repeated binary search, the same few elements (first, second, and third quartiles) are used VERY frequently, so putting them in a single cache line could be far superior to random access into the list.

    Dividing each level into say 4 or 8 equal sections (instead of 2) and doing a linear search through those could also be quicker than the bisection search, because a linear search doesn't require calculating the partition and also has fewer data dependencies that can cause cache stalls.

    But all of these are still O(lg N).

    0 讨论(0)
  • 2020-12-12 14:17

    Although in the general case you cannot do better than O(log N), you can at least optimize that, thus significantly reducing the constant of proportionality in front of O(log N).

    If you have to perform multiple search on the same array, these can be vectorized using SIMD extensions, thus further cutting down on computation cost.

    In particular, if you are dealing with arrays of floating point numbers which satisfy certain properties, than there are ways to construct a special index which then allows to search the array in O(1).

    All of the above aspects are discussed with test results in: Cannizzo, 2015, Fast and Vectorizable Alternative to Binary Search in O(1) Applicable to a Wide Domain of Sorted Arrays of Floating Point Numbers The paper comes with source code on github.

    0 讨论(0)
  • 2020-12-12 14:19

    If the values in the list are evenly distributed then you could try a weighted split instead of a binary split, e.g. if the desired value is a third of the way from the current lower limit to the current value then you could try the element that is also a third of the way. This could suffer badly on lists where values are bunched up though.

    0 讨论(0)
  • 2020-12-12 14:20

    You can do better than O(log n) if the values are integers, in which case the best worst-case running time you can achieve, in terms of n, is O(sqrt(log n)). Otherwise, there is no way to beat O(log n) unless there are patterns in the input sequence. There are two approaches used to beat O(log n) in the case of integers.

    First, you can use y-fast trees which work by storing in a hash table all prefixes for which you are storing at least one integer with that prefix. This enables you to perform a binary search to find the length of the longest matching prefix. This enables you to find the successor of an element for which you are searching in time O(log w) where w is the number of bits in a word. There are some details to work though to make this work and use only linear space, but they aren't too bad (see the link below).

    Second, you can use fusion trees, which use bit tricks to enable you to perform w^O(1) comparisons in just a constant number of instructions, yielding a running time of O(log n / log w).

    The optimum tradeoff between these two data structures occurs when log w = sqrt(log n), giving a running time of O(sqrt(log n)).

    For details on the above, see lectures 12 and 13 of Erik Demaine's course: http://courses.csail.mit.edu/6.851/spring07/lec.html

    0 讨论(0)
  • 2020-12-12 14:27

    First of all, measure before doing optimization.

    Do you really need to optimize that search?

    If so, then secondly, think about algorithmic complexity first. E.g. can you use a tree (like a std::map, say) instead of an array? If so then it depends on the relative frequency of insertions/deletions versus searches, but the premise of having a sorted array at hand indicates that searches are frequent compared to data set changes, so that it would make sense to do some little additional work for insertions/deletions, making each search much faster -- namely logarithmic time.

    If you find that indeed the search times are a bottleneck that needs addressing, and no, no change of data representation is possible, and the list is short, then a linear search will generally be faster because it does less work per comparision.

    Otherwise, if the list is longer, and no particular distribution of values is known or assumed, and the values can't be treated as numerical, and memory consumption should be constant (ruling out constructing a hash table, say), then binary search produces 1 bit of information per comparision and is probably the best you can do for the first search.

    Cheers & hth.

    0 讨论(0)
提交回复
热议问题