Word Prediction algorithm

前端未结

关注

 2  542

野趣味

I\'m sure there is a post on this, but I couldn\'t find one asking this exact question. Consider the following:

We have a word dictionary available
We are

相关标签:

2条回答

面向向阳花

2021-01-31 11:24
This is the problem of language modeling. For a baseline approach, The only thing you need is a hash table mapping fixed-length chains of words, say of length k, to the most probable following word.(*)

At training time, you break the input into (k+1)-grams using a sliding window. So if you encounter
```
The wrath sing, goddess, of Peleus' son, Achilles
```
you generate, for k=2,
```
START START the
START the wrath
the wrath sing
wrath sing goddess
goddess of peleus
of peleus son
peleus son achilles
```
This can be done in linear time. For each 3-gram, tally (in a hash table) how often the third word follows the first two.

Finally, loop through the hash table and for each key (2-gram) keep only the most commonly occurring third word. Linear time.

At prediction time, look only at the k (2) last words and predict the next word. This takes only constant time since it's just a hash table lookup.

If you're wondering why you should keep only short subchains instead of full chains, then look into the theory of Markov windows. If your model were to remember all the chains of words that it has seen in its input, then it would badly overfit its training data and only reproduce its input at prediction time. How badly depends on the training set (more data is better), but for k>4 you'd really need smoothing in your model.

(*) Or to a probability distribution, but this is not needed for your simple example use case.
0 讨论(0)
发布评论:

提交评论
- 加载中...
北荒

2021-01-31 11:30

Yeh Whye Teh also has some recent interesting work that addresses this problem. The "Sequence Memoizer" extends the traditional prediction-by-partial-matching scheme to take into account arbitrarily long histories.

Here is a link the original paper: http://www.stats.ox.ac.uk/~teh/research/compling/WooGasArc2011a.pdf

It is also worth reading some of the background work, which can be found in the paper "A Bayesian Interpretation of Interpolated Kneser-Ney"

0 讨论(0)
发布评论:

提交评论
- 加载中...