问题
I would like to know how to configure stopwordsremover with french language in spark 1.6.3.
I'm currently using pyspark.
Thanks for your help.
Best regards,
回答1:
Take a look at the nltk package
I use it for portuguese words:
from pyspark.ml.feature import StopWordsRemover
import nltk
nltk.download("stopwords")
...
stopwordList = nltk.corpus.stopwords.words('portuguese')
remover = StopWordsRemover(inputCol=tokenizer.getOutputCol(), outputCol="stopWordsRem", stopWords=stopwordList)
Hope it helps
回答2:
Based on Python Spark 1.6.3 docs, pyspark.ml.feature.StopWordsRemover does not have a language parameter. However you can always provide your own list of stopwords via the "stopWords" parameter.
来源:https://stackoverflow.com/questions/49012895/pyspark-how-to-configure-stopwordsremover-with-french-language-on-spark-1-6-3