问题
I want to calculate the polarity and subjectivity for some headlines that I have. My code works fine, it does not gives any error but for some rows it gives result 0.00000 for polarity and subjectivity. Do you know why?
You can download the data form here:
https://www.sendspace.com/file/e8w4tw
Am I doing something wrong? This is the code:
import pandas as pd
from textblob import TextBlob
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
df = pd.read_excel('coca cola news.xlsx', encoding='utf8')
df = df.dropna().reset_index(drop = True)
df = df.drop_duplicates().reset_index(drop = True)
print(df)
head_sentiment = []
head_subj = []
par_sentiment = []
par_subj = []
df['Headline Sentiment'] = df['Headline'].apply(lambda text: TextBlob(text).sentiment.polarity).round(4)
df['Headline Subjectivity'] = df['Headline'].apply(lambda text: TextBlob(text).sentiment.subjectivity).round(4)
df['Paragraph Sentiment'] = df['Paragraph'].apply(lambda text: TextBlob(text).sentiment.polarity).round(4)
df['Paragraph Subjectivity'] = df['Paragraph'].apply(lambda text: TextBlob(text).sentiment.subjectivity).round(4)
print(df)
print(df[df.columns[-4:]])
I mean, I know that 0 is possible result, but Im getting 0.0000 in 40%-50% of rows, thats a lot, not even 0.00001, that seams strange to me.
Can you help me?
回答1:
its sometimes happen. Try to use polarity method from polyglot. https://polyglot.readthedocs.io/en/latest/Installation.html
and compare results. Firstly you should make some preprocessing like:
- remove stopwords
- remove numbers, html links, numbers, special characters etc
来源:https://stackoverflow.com/questions/58920075/how-to-do-sentiment-analysis-of-headlines-with-textblob-and-python