Where to find an exhaustive list of stop words?

China☆狼群 提交于 2019-12-12 15:03:05

问题


Where could I find an exhaustive list of stop words? The one I have is quite short and it seems to be inapplicable to scientific texts. I am creating lexical chains to extract key topics from scientific papers. The problem is that words like based, regarding, etc. should also be considered as stop words as they do not deliver much sense.


回答1:


You can also easily add to existing stop word lists. E.g. use the one in the NLTK toolkit:

from nltk.corpus import stopwords

and then add whatever you think is missing:

stopwords = stopwords.words('english')+["based", "regarding"]

The original NLTK list is described here.




回答2:


It is difficult to find an exhaustive list of stop words because a given word could be considered as a stop word in a given domain but it is an important word in another domain.

you could take a look at some lists of stop words:

http://blog.adlegant.com/how-to-install-nltk-corporastopwords/

http://www.lextek.com/manuals/onix/stopwords1.html

http://www.ranks.nl/stopwords

http://xpo6.com/list-of-english-stop-words/



来源:https://stackoverflow.com/questions/37701305/where-to-find-an-exhaustive-list-of-stop-words

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!