negation handling in R, how can I replace a word following a negation in R?

谁说我不能喝 提交于 2020-01-13 20:23:10

问题


I'm doing sentiment analysis for financial articles. To enhance the accuracy of my naive Bayes classifier, I'd like to implement negation handling.

Specifically, I want to add the prefix "not_" to the word following a "not" or "n't"

So if there's something like this in my corpus:

 x <- "They didn't sell the company." 

I want to get the following:

"they didn't not_sell the company."

(the stopword "didn't" will be removed later)

I could find only the gsub() function, but it doesn't seem to work for this task.

Any help would be appreciated!! Thank you!


回答1:


Specifically, I want to add the prefix "not_" to the word following a "not" or "n't"

str_negate <- function(x) {
  gsub("not ","not not_",gsub("n't ","n't not_",x))
}

Or I suppose you could use strsplit:

str_negate <- function(x) {
  str_split <- unlist(strsplit(x=x, split=" "))
  is_negative <- grepl("not|n't",str_split,ignore.case=T)
  negate_me <- append(FALSE,is_negative)[1:length(str_split)]
  str_split[negate_me==T]<- paste0("not_",str_split[negate_me==T])
  paste(str_split,collapse=" ")
}

either way gives you:

> str_negate("They didn't sell the company")
[1] "They didn't not_sell the company"
> str_negate("They did not sell the company")
[1] "They did not not_sell the company"


来源:https://stackoverflow.com/questions/21811580/negation-handling-in-r-how-can-i-replace-a-word-following-a-negation-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!