Building or Finding a “relevant terms” suggestion feature

后端未结

关注

 3  569

谎友^ 2021-02-02 03:12

Given a few words of input, I want to have a utility that will return a diverse set of relevant terms, phrases, or concepts. A caveat is that it would need to have a large grap

3条回答

-上瘾入骨i (楼主)

2021-02-02 03:51

Peter Norvig (director of research at Google) spoke about how they do this at Google (specifically mentioning Google Sets) in a Facebook Tech Talk. The idea is that a relatively simple algorithm on a huge dataset (e.g. the entire web) is much better than a complicated algorithm on a small data set.

You could look at Google's n-gram collection as a starting point. You'd start to see what concepts are grouped together. Norvig hinted that internally Google has up to 7-grams for use in things like Google Translate.

If you're more ambitious, you could download all of Wikipedia's articles in the language you desire and create your own n-gram database.

The problem is even more complicated if you just have a single word; check out this recent thesis for more details on word sense disambiguation.

It's not an easy problem, but it is useful as you mentioned. In the end, I think you'll find that a really successful implementation will have a relatively simple algorithm and a whole lot of data.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...