Similar to this case, i would like to count the number of occurrences of multiple words and numbers that occur in a vector of sentences with str_count of the stringr package.
Using sprintf you can add word boundaries:
number_of_keywords_df <- str_count(df, paste(sprintf("\\b%s\\b", keywords), collapse = '|'))
number_of_keywords_df
Which yields
[1] 3 2 2
Try putting word boundaries around your keywords:
keywords <- c("honda","civic","toyota","auris","nissan","skyline","1988","1400","159")
keywords <- paste0("\\b", keywords, "\\b")
In regex lingo, \bhonda\b
says to match the isolated word honda
. Hence hondas
would not match because it has an extra letter at the end.