Python stemming (with pandas dataframe)

I came to the following problem while programming Python: I use a Pandas dataframe containing words that have to stemmed (using SnowballStemmer). I want the words to be stemmed to investigate the results for stemmed vs non stemmed text and for this I will be using a classifier. I use the following code for the stemmer:

from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer("dutch")

I want to stem all separate words in the list while remaining the order and keeping every key with every value. This is the column from the Pandas dataframe from which I want every separate word stemmed:

I thought of something like this:

for w in data[["stemmed"]]:
stemmer.stem(w)

However, after running it did not stem each seperate word. when you look at row 7 you can see the word “amsterdamse” in there which is actually supposted to be stemmed to “amsterdam”:

You have to apply the stemming on each word and store it into the "stemmed" column.

EDIT

for example :

In [23]: data
Out[23]: 
                      stemmed
0       [amsterdamse, and , yes]
1  [marathon, hello, verbazing]

Then the following should work

data['stemmed'] = data["stemmed"].apply(lambda x: [stemmer.stem(y) for y in x])

Out[25]: 
0        [amsterdam, and, yes]
1    [marathon, hello, verbaz]
Name: stemmed, dtype: object

来源：https://stackoverflow.com/questions/37443138/python-stemming-with-pandas-dataframe

标签

python

pandas

nlp

stemming

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!