Filter out common words for search query

允我心安 提交于 2020-01-13 19:21:11

问题


Are there any easy ways to implement filtering a user's input (possibly a question) by extracting the meaningful data in the query?

I basically want to filter out any noise words so I can send a 'clean' query to Google's search api.


回答1:


Um, won't Google do this for you? Send all those dirty, filthy words to Google and let them clean them up for you.




回答2:


Jeff talked about "stop words" in one of the previous stackoverflow podcasts. You might try searching for that phrase on google. The wikipedia page seems to have some overview and pointers to options.

http://en.wikipedia.org/wiki/Stop_words




回答3:


You can try removing the top X most common English words, but you will always run into trouble with a naive approach like this.

This is because common English words can have special significance in the realm of Computer Science (or other areas). A recent SO podcast (#32) mentions this very issue.




回答4:


I used the stop words approach when implementing a basic search engine and it worked fine. Try a sample list like the one here

Based on feedback from your users, you can modify your stop word list accordingly.



来源:https://stackoverflow.com/questions/386995/filter-out-common-words-for-search-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!