I have a data frame with two columns:
df = data.frame(animals = c(\"cat; dog; bird\", \"dog; bird\", \"bird\"), sentences = c(\"the cat is brown; the dog is bar
Here's a base R
solution:
First remove all the ;
with gsub
, then split the sentences column and unlist
it into a vector:
split_sentence_column = unlist(strsplit(gsub(';','',df$sentences),' '))
Then set up a for loop and for each row get a vector of the animals, check which of the sentence column animals are in the animal list with %in%
, then sum all the TRUE
cases. We can then assign this to a new df column directly:
for(i in 1:nrow(df)){
animals = unlist(strsplit(df$animals[i], '; '))
df$sum_occurrences_sentences_column[i] = sum(split_sentence_column %in% animals)
}
> df
animals sentences sum_occurrences_sentences_column
1 cat; dog; bird the cat is brown; the dog is barking; the bird is green and blue 6
2 dog; bird the dog is black; the bird is yellow and blue 5
3 bird the bird is blue 3