I am cleaning a column in my data frame
, Sumcription, and am trying to do 3 things:
Remove stop words
<import spacy
import pandas as pd
# Load spacy model
nlp = spacy.load('en', parser=False, entity=False)
# New stop words list
customize_stop_words = [
'attach'
]
# Mark them as stop words
for w in customize_stop_words:
nlp.vocab[w].is_stop = True
# Test data
df = pd.DataFrame( {'Sumcription': ["attach poster on the wall because it is cool",
"eating and sleeping"]})
# Convert each row into spacy document and return the lemma of the tokens in
# the document if it is not a sotp word. Finally join the lemmas into as a string
df['Sumcription_lema'] = df.Sumcription.apply(lambda text:
" ".join(token.lemma_ for token in nlp(text)
if not token.is_stop))
print (df)
Output:
Sumcription Sumcription_lema
0 attach poster on the wall because it is cool poster wall cool
1 eating and sleeping eat sleep