R sentiment analysis with phrases in dictionaries

后端 未结 1 1468
盖世英雄少女心
盖世英雄少女心 2021-01-03 17:30

I am performing sentiment analysis on a set of Tweets that I have and I now want to know how to add phrases to the positive and negative dictionaries.

I\'ve read in

相关标签:
1条回答
  • 2021-01-03 18:18

    The function score.sentiment seems to work. If I try a very simple setup,

    Tweets = c("this is good", "how bad it is")
    neg = c("bad")
    pos = c("good")
    analysis=score.sentiment(Tweets, pos, neg)
    table(analysis$score)
    

    I get the expected result,

    > table(analysis$score)
    
    -1  1 
     1  1 
    

    How are you feeding the 20 tweets to the method? From the result you're posting, that 0 20, I'd say that your problem is that your 20 tweets do not have any positive or negative word, although of course it was the case you would have noticed it. Maybe if you post more details on your list of tweets, your positive and negative words it would be easier to help you.

    Anyhow, your function seems to be working just fine.

    Hope it helps.

    EDIT after clarifications via comments:

    Actually, to solve your problem you need to tokenize your sentences into n-grams, where n would correspond to the maximum number of words you are using for your list of positive and negative n-grams. You can see how to do this e.g. in this SO question. For completeness, and since I've tested it myself, here is an example for what you could do. I simplify it to bigrams (n=2) and use the following inputs:

    Tweets = c("rewarding hard work with raising taxes and VAT. #LabourManifesto", 
                  "Ed Miliband is offering 'wrong choice' of 'more cuts' in #LabourManifesto")
    pos = c("rewarding hard work")
    neg = c("wrong choice")
    

    You can create a bigram tokenizer like this,

    library(tm)
    library(RWeka)
    BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min=2,max=2))
    

    And test it,

    > BigramTokenizer("rewarding hard work with raising taxes and VAT. #LabourManifesto")
    [1] "rewarding hard"       "hard work"            "work with"           
    [4] "with raising"         "raising taxes"        "taxes and"           
    [7] "and VAT"              "VAT #LabourManifesto"
    

    Then in your method you simply substitute this line,

    word.list = str_split(sentence, '\\s+')
    

    by this

    word.list = BigramTokenizer(sentence)
    

    Although of course it would be better if you changed word.list to ngram.list or something like that.

    The result is, as expected,

    > table(analysis$score)
    
    -1  0 
     1  1
    

    Just decide your n-gram size and add it to Weka_control and you should be fine.

    Hope it helps.

    0 讨论(0)
提交回复
热议问题