I came to the following problem while programming Python: I use a Pandas dataframe containing words that have to stemmed (using SnowballStemmer). I want the words to be stemmed to investigate the results for stemmed vs non stemmed text and for this I will be using a classifier. I use the following code for the stemmer:
from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer("dutch")
I want to stem all separate words in the list while remaining the order and keeping every key with every value. This is the column from the Pandas dataframe from which I want every separate word stemmed:
I thought of something like this:
for w in data[["stemmed"]]:
stemmer.stem(w)
However, after running it did not stem each seperate word. when you look at row 7 you can see the word “amsterdamse” in there which is actually supposted to be stemmed to “amsterdam”:
You have to apply the stemming on each word and store it into the "stemmed" column.
EDIT
for example :
In [23]: data
Out[23]:
stemmed
0 [amsterdamse, and , yes]
1 [marathon, hello, verbazing]
Then the following should work
data['stemmed'] = data["stemmed"].apply(lambda x: [stemmer.stem(y) for y in x])
Out[25]:
0 [amsterdam, and, yes]
1 [marathon, hello, verbaz]
Name: stemmed, dtype: object
来源:https://stackoverflow.com/questions/37443138/python-stemming-with-pandas-dataframe