pyspark : how to configure StopWordsRemover with french language on spark 1.6.3

送分小仙女□ 提交于 2019-12-11 09:49:36

问题


I would like to know how to configure stopwordsremover with french language in spark 1.6.3.

I'm currently using pyspark.

Thanks for your help.

Best regards,


回答1:


Take a look at the nltk package

I use it for portuguese words:

from pyspark.ml.feature import StopWordsRemover
import nltk
nltk.download("stopwords")

...

stopwordList = nltk.corpus.stopwords.words('portuguese')
remover = StopWordsRemover(inputCol=tokenizer.getOutputCol(), outputCol="stopWordsRem", stopWords=stopwordList)

Hope it helps




回答2:


Based on Python Spark 1.6.3 docs, pyspark.ml.feature.StopWordsRemover does not have a language parameter. However you can always provide your own list of stopwords via the "stopWords" parameter.



来源:https://stackoverflow.com/questions/49012895/pyspark-how-to-configure-stopwordsremover-with-french-language-on-spark-1-6-3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!