问题
I have a Dataset with 2 columns and multiple rows. first column ID, second column the text which belongs to it.
I want to add more columns which sums up how many times a certain string appears in the text from the Row. the string would be "\n Positive\n", "\n Neutral\n", "\n Negativ\n"`
Example of the Dataset:
Id, Content
2356, I like cheese.\n Positive\nI don't want to be here.\n Negative\n
3456, I am alone.\n Neutral\n
At the End it should look like
Id, Content,Positiv, Neutral, Negativ
2356, I like cheese.\n Positive\nI don't want to be here.\n Negative\n,1 ,0 ,1
3456, I am alone.\n Neutral\n, 0, 1, 0
Right now i tried it like this but it isn't giving the right answers:
getCount1 <- function(data, keyword)
{
Positive <- str_count(Dataset$CONTENT, keyword)
return(data.frame(data,Positive))
}
Stufe1 <-getCount1(Dataset,'\n Positive\n')
################################################################
getCount2 <- function(data, keyword)
{
Neutral <- str_count(Stufe1$CONTENT, keyword)
return(data.frame(data,Neutral))
}
Stufe2 <-getCount2(Stufe1,'\n Neutral\n')
#####################################################
getCount3 <- function(data, keyword)
{
Negative <- str_count(Stufe2$CONTENT, keyword)
return(data.frame(data,Negative))
}
Stufe3 <-getCount3(Stufe2,'\n Negative\n')
回答1:
I Assume this is what you require
Sample data
id <- c(1:4)
text <- c('I have a Dataset with 2 columns a',
'nd multiple rows. first column ID', 'second column the text which',
'n the text which belongs to it.')
dataset <- data.frame(id,text)
Function to find count
library(stringr)
getCount <- function(data,keyword)
{
wcount <- str_count(dataset$text, keyword)
return(data.frame(data,wcount))
}
Calling getCount should give the updated dataset
> getCount(dataset,'second')
id text wcount
1 I have a Dataset with 2 columns a 0
2 nd multiple rows. first column ID 0
3 second column the text which 1
4 n the text which belongs to it. 0
回答2:
To offer some alternatives, let's start with a slightly modified version of @on_the_shores_of_linux_sea's dataset.
id <- c(1:4)
text <- c('I have a Dataset with 2 columns a',
'nd multiple rows. first column ID rows',
'second column the text which',
'n the text which belongs to it.')
dataset <- data.frame(id,text)
Sticking with base R functions, you could come up with a function like this one.
wordCounter <- function(invec, word, ...) {
vapply(regmatches(invec, gregexpr(word, invec, ...)), length, 1L)
}
You would use it like this:
## allows other arguments to gregexpr
wordCounter(dataset$text, "id", ignore.case = TRUE)
# [1] 0 1 0 0
wordCounter(dataset$text, "id")
# [1] 0 0 0 0
wordCounter(dataset$text, "rows")
# [1] 0 2 0 0
wordCounter(dataset$text, "second", ignore.case = TRUE)
# [1] 0 0 1 0
Another alternative, if you want to go with some ready-made solutions, would be to use the "stringi" package, which has a nifty stri_count*
set of functions. Here, I've used stri_count_fixed
:
library(stringi)
stri_count_fixed(dataset$text, "rows")
# [1] 0 2 0 0
回答3:
This can also be done without loading any additional library, as pointed out by Ananda. My solution would be, provided that the 2-column table is called dataset
and the string to look for is mystring
:
countOccurr = function(text,motif) {
res = gregexpr(motif,text,fixed=T)[[1]]
ifelse(res[1] == -1, 0, length(res))
}
dataset = cbind(dataset, count = vapply(dataset[,2], countOccurr, 1, motif=mystring))
Beware that the second column of your dataframe has to be of mode character if you want to avoid problems (the dataframe given as sample data by @on-the-shores-of-linux-sea retains mode factor, which is fine with his solution but not with mine). Otherwise use as.character(dataset[,2])
to cast.
来源:https://stackoverflow.com/questions/24550066/count-occurrences-of-specific-words-from-a-dataframe-row-in-r