Algorithm for autocomplete?

后端 未结 9 1225
暖寄归人
暖寄归人 2020-11-28 00:45

I am referring to the algorithm that is used to give query suggestions when a user types a search term in Google.

I am mainly interested in: 1. Most important resul

相关标签:
9条回答
  • 2020-11-28 01:17

    Take a look at Firefox's Awesome bar algorithm

    Google suggest is useful, because it take the millions of popular queries + your past related queries into account.

    It doesn't have a good completion algorithm / UI though:

    1. Doesn't do substrings
    2. Seems like a relatively simple word-boundary prefix algorithm.
      For example: Try tomcat tut --> correctly suggest "tomcat tutorial". Now try tomcat rial --> no suggestions )-:
    3. Doesn't support "did you mean?" - as in google search results.
    0 讨论(0)
  • 2020-11-28 01:25

    Google's exact algorithm is unknown, but it is said to work by statistical analysis of users input. An approach not suitable for most cases. More commonly auto completion is implemented using one of the following:

    • Trees. By indexing the searchable text in a tree structure (prefix tree, suffix tree, dawg, etc..) one can execute very fast searches at the expense of memory storage. The tree traversal can be adapted for approximate matching.
    • Pattern Partitioning. By partitioning the text into tokens (ngrams) one can execute searches for pattern occurrences using a simple hashing scheme.
    • Filtering. Find a set of potential matches and then apply a sequential algorithm to check each candidate.

    Take a look at completely, a Java autocomplete library that implements some of the latter concepts.

    0 讨论(0)
  • 2020-11-28 01:26

    I think that one might be better off constructing a specialized trie, rather than pursuing a completely different data structure.

    I could see that functionality manifested in a trie in which each leaf had a field that reflected the frequency of searches of its corresponding word.

    The search query method would display the descendant leaf nodes with the largest values calculated from multiplying the distance to each descendant leaf node by the search frequency associated with each descendant leaf node.

    The data structure (and consequently the algorithm) Google uses are probably vastly more complicated, potentially taking into a large number of other factors, such as search frequencies from your own specific account (and time of day... and weather... season... and lunar phase... and... ). However, I believe that the basic trie data structure can be expanded to any kind of specialized search preference by including additional fields to each of the nodes and using those fields in the search query method.

    0 讨论(0)
提交回复
热议问题