Edited df
and dict
I have a data frame containing sentences:
df <- data_frame(text = c(\"I love pandas
Update : Here's the easiest dplyr
method I've found so far. And I'll add a stringi
function to speed things up. Provided there are no identical sentences in df$text
, we can group by that column and then apply mutate()
Note: Package versions are dplyr 0.4.1 and stringi 0.4.1
library(dplyr)
library(stringi)
group_by(df, text) %>%
mutate(score = sum(dict$score[stri_detect_fixed(text, dict$word)]))
# Source: local data frame [2 x 2]
# Groups: text
#
# text score
# 1 I love pandas 2
# 2 I hate monkeys -2
I removed the do()
method I posted last night, but you can find it in the edit history. To me it seems unnecessary since the above method works as well and is the more dplyr
way to do it.
Additionally, if you're open to a non-dplyr
answer, here are two using base functions.
total <- with(dict, {
vapply(df$text, function(X) {
sum(score[vapply(word, grepl, logical(1L), x = X, fixed = TRUE)])
}, 1)
})
cbind(df, total)
# text total
# 1 I love pandas 2
# 2 I hate monkeys -2
Or an alternative using strsplit()
produces the same result
s <- strsplit(df$text, " ")
total <- vapply(s, function(x) sum(with(dict, score[match(x, word, 0L)])), 1)
cbind(df, total)
A bit of double looping via sapply
and gregexpr
:
res <- sapply(dict$word, function(x) {
sapply(gregexpr(x,df$text),function(y) length(y[y!=-1]) )
})
rowSums(res * dict$score)
#[1] 2 -2
This also accounts for when there is multiple matches in a single string:
df <- data.frame(text = c("I love love pandas", "I hate monkeys"))
# run same code as above
#[1] 3 -2