Word Prediction algorithm

前端 未结 2 538
野趣味
野趣味 2021-01-31 10:40

I\'m sure there is a post on this, but I couldn\'t find one asking this exact question. Consider the following:

  1. We have a word dictionary available
  2. We are
相关标签:
2条回答
  • 2021-01-31 11:24

    This is the problem of language modeling. For a baseline approach, The only thing you need is a hash table mapping fixed-length chains of words, say of length k, to the most probable following word.(*)

    At training time, you break the input into (k+1)-grams using a sliding window. So if you encounter

    The wrath sing, goddess, of Peleus' son, Achilles
    

    you generate, for k=2,

    START START the
    START the wrath
    the wrath sing
    wrath sing goddess
    goddess of peleus
    of peleus son
    peleus son achilles
    

    This can be done in linear time. For each 3-gram, tally (in a hash table) how often the third word follows the first two.

    Finally, loop through the hash table and for each key (2-gram) keep only the most commonly occurring third word. Linear time.

    At prediction time, look only at the k (2) last words and predict the next word. This takes only constant time since it's just a hash table lookup.

    If you're wondering why you should keep only short subchains instead of full chains, then look into the theory of Markov windows. If your model were to remember all the chains of words that it has seen in its input, then it would badly overfit its training data and only reproduce its input at prediction time. How badly depends on the training set (more data is better), but for k>4 you'd really need smoothing in your model.

    (*) Or to a probability distribution, but this is not needed for your simple example use case.

    0 讨论(0)
  • 2021-01-31 11:30

    Yeh Whye Teh also has some recent interesting work that addresses this problem. The "Sequence Memoizer" extends the traditional prediction-by-partial-matching scheme to take into account arbitrarily long histories.

    Here is a link the original paper: http://www.stats.ox.ac.uk/~teh/research/compling/WooGasArc2011a.pdf

    It is also worth reading some of the background work, which can be found in the paper "A Bayesian Interpretation of Interpolated Kneser-Ney"

    0 讨论(0)
提交回复
热议问题