问题
I want to do stemming in a file. When I use it in terminal it works fine, but when I apply it in a text file, it does not work. Terminal code:
print PorterStemmer().stem_word('complications')
Function code:
def stemming_text_1():
with open('test.txt', 'r') as f:
text = f.read()
print text
singles = []
stemmer = PorterStemmer() #problem from HERE
for plural in text:
singles.append(stemmer.stem(plural))
print singles
Input test.txt
126211 crashes bookmarks runs error logged debug core bookmarks
126262 manual change crashes bookmarks propagated ion view bookmarks
Desired/expected output
126211 crash bookmark runs error logged debug core bookmark
126262 manual change crash bookmark propagated ion view bookmark
Any suggestion will be greatly appreciated, thanks :)
回答1:
You need to split the text into words for the stemmer to work. Currently, the variable text
contains the whole file as one big string. The loop for plural in text:
assigns each character in text
to plural
.
Try for plural in text.split():
instead.
[EDIT] To get the output in the format you want, you need to read the file line by line instead of reading it all at once:
def stemming_text_1():
with open('test.txt', 'r') as f:
for line in f:
print line
singles = []
stemmer = PorterStemmer() #problem from HERE
for plural in line.split():
singles.append(stemmer.stem(plural))
print ' '.join(singles)
来源:https://stackoverflow.com/questions/16835372/python-stemming-words-in-a-file