问题
I would like someone to correct my understanding of how VADER scores text. I've read an explanation of this process here, however I cannot match the compound score of test sentences to Vader's output when recreating the process it describes. Lets say we have the sentence:
"I like using VADER, its a fun tool to use"
The words VADER picks up are 'like' (+1.5 score), and 'fun' (+2.3). According to the documentation, these values are summed (so +3.8), and then normalized to a range between 0 and 1 using the following function:
(alpha = 15)
x / x2 + alpha
With our numbers, this should become:
3.8 / 14.44 + 15 = 0.1290
VADER, however, outputs the returned compound score as follows:
Scores: {'neg': 0.0, 'neu': 0.508, 'pos': 0.492, 'compound': 0.7003}
Where am I going wrong in my reasoning? Similar questions have been asked several times, however an actual example of VADER classifying has not yet been provided. Any help would be appreciated.
回答1:
It's just your normalization that is wrong. From the code the function is defined:
def normalize(score, alpha=15):
"""
Normalize the score to be between -1 and 1 using an alpha that
approximates the max expected value
"""
norm_score = score/math.sqrt((score*score) + alpha)
return norm_score
So you have 3.8/sqrt(3.8*3.8 + 15) = 0.7003
来源:https://stackoverflow.com/questions/51707282/example-of-nltks-vader-scoring-text