Technique to remove common words(and their plural versions) from a string

后端未结

关注

 3  1615

悲哀的现实 2021-02-05 08:17

I am attempting to find tags(keywords) for a recipe by parsing a long string of text. The text contains the recipe ingredients, directions and a short blurb.

Wha

3条回答

小鲜肉 (楼主)

2021-02-05 09:06

You ask about speed, but you should be more concerned with accuracy. Both your suggestions will make a lot of mistakes, removing either too much or too little (for example, there are a lot of words that contain the substring "at"). I second the suggestion to look into the nltk module. In fact, one of the early examples in the NLTK book involves removing common words until the most common remaining ones reveal something about the genre. You'll get not only tools, but instruction on how to go about it.

Anyway you'll spend much longer writing your program than your computer will spend executing it, so focus on doing it well.

0 讨论(0)

查看其它3个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复