Categorizing Words and Category Values

前端 未结 21 1698
温柔的废话
温柔的废话 2021-01-31 05:49

We were set an algorithm problem in class today, as a \"if you figure out a solution you don\'t have to do this subject\". SO of course, we all thought we will give it a go.

21条回答
  •  不知归路
    2021-01-31 06:34

    Well, you can't use Google, but you CAN use Yahoo, Ask, Bing, Ding, Dong, Kong... I would do a few passes. First query the 100 words against 2-3 search engines, grab the first y resulting articles (y being a threshold to experiment with. 5 is a good start I think) and scan the text. In particular I"ll search for the 10 categories. If a category appears more than x time (x again being some threshold you need to experiment with) its a match. Based on that x threshold (ie how many times a category appears in the text) and how may of the top y pages it appears in you can assign a weigh to a word-category pair. for better accuracy you can then do another pass with those non-google search engines with the word-category pair (with a AND relationship) and apply the number of resulting pages to the weight of that pair. Them simply assume the word-category pair with highest weight is the right one (assuming you'll even have more than one option). You can also multi assign a word to a multiple category if the weights are close enough (z threshold maybe). Based on that you can introduce any number of words and any number of categories. And You'll win your challenge. I also think this method is good to evaluate the weight of potential adwords in advertising. but that's another topic....

    Good luck

    Harel

提交回复
热议问题