How to find matching words in a DF from list of words and returning the matched words in new column [duplicate]

雨燕双飞 提交于 2020-01-30 13:14:27

问题


I have a DF with 2 columns and I have a list of words.

list_of_words <- c("tiger","elephant","rabbit", "hen", "dog", "Lion", "camel", "horse")

df <- tibble::tibble(page=c(12,6,9,18,2,15,81,65),
               text=c("I have two pets: a dog and a hen",
                      "lion and Tiger are dangerous animals",
                      "I have tried to ride a horse",
                      "Why elephants are so big in size",
                      "dogs are very loyal pets",
                      "I saw a tiger in the zoo",
                      "the lion was eating a buffalo",
                      "parrot and crow are very clever birds"))

animals <- c("dog,hen", "lion,tiger", "horse", FALSE, "dog", "tiger", "lion", FALSE)

cbind(df, animals)
#>   page                                  text    animals
#> 1   12      I have two pets: a dog and a hen    dog,hen
#> 2    6  lion and Tiger are dangerous animals lion,tiger
#> 3    9          I have tried to ride a horse      horse
#> 4   18      Why elephants are so big in size      FALSE
#> 5    2              dogs are very loyal pets        dog
#> 6   15              I saw a tiger in the zoo      tiger
#> 7   81         the lion was eating a buffalo       lion
#> 8   65 parrot and crow are very clever birds      FALSE

I need to find out if any of the words from list are present in one of the column of the DF or not. If yes, then return the word/words to a new column in the DF. This is the list of words ->(tiger,elephant,rabbit, hen, dog, Lion, camel, horse). This is how my DF Looks like I want something like this


回答1:


Hope this helps!

library(dplyr)

df %>% 
  rowwise() %>%
  mutate(animals = paste(list_of_words[unlist(
    lapply(list_of_words, function(x) grepl(x, text, ignore.case = T)))], collapse=",")) %>%
  data.frame()

Output is:

  page                                  text    animals
1   12                       pets: dog & hen    hen,dog
2    6 Lions and tigers are dangerous animal tiger,Lion
3    9          I have tried to ride a horse      horse
4   65   parrot & crow are very clever birds           

Sample data:

df <- structure(list(page = c(12, 6, 9, 65), text = structure(c(4L, 
2L, 1L, 3L), .Label = c("I have tried to ride a horse", "Lions and tigers are dangerous animal", 
"parrot & crow are very clever birds", "pets: dog & hen"), class = "factor")), .Names = c("page", 
"text"), row.names = c(NA, -4L), class = "data.frame")

list_of_words <- c("tiger", "elephant", "rabbit", "hen", "dog", "Lion", "camel", "horse")


Another approach:

library(data.table)
setDT(df)[, animals := paste(list_of_words[unlist(lapply(list_of_words, function(x) grepl(x, text, ignore.case = T)))], collapse = ","), by = 1:nrow(df)]

#> df
#   page                                  text    animals
#1:   12                       pets: dog & hen    hen,dog
#2:    6 Lions and tigers are dangerous animal tiger,Lion
#3:    9          I have tried to ride a horse      horse
#4:   65   parrot & crow are very clever birds           


来源:https://stackoverflow.com/questions/50150811/how-to-find-matching-words-in-a-df-from-list-of-words-and-returning-the-matched

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!