How do I extract appearances of a vector of strings in another vector of strings using R?

左心房为你撑大大i 提交于 2019-12-08 11:22:29

Here is one option. First we split each row of the string column into separate strings (right now "apple, orange, plum, tomato" is all one string). Then we compare the list of strings to the contents of the fruits$fruit column and store a list of the matching values in the new fruits column.

library("tidyverse")
strings <- tibble(
  string = c(
    "apple, orange, plum, tomato",
    "plum, beat, pear, cactus",
    "centipede, toothpick, pear, fruit"
  )
)

fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))

strings %>%
  mutate(str2 = str_split(string, ", ")) %>%
  rowwise() %>%
  mutate(fruits = list(intersect(str2, fruits$fruit)))
#> Source: local data frame [3 x 3]
#> Groups: <by row>
#> 
#> # A tibble: 3 x 3
#>   string                            str2      fruits   
#>   <chr>                             <list>    <list>   
#> 1 apple, orange, plum, tomato       <chr [4]> <chr [3]>
#> 2 plum, beat, pear, cactus          <chr [4]> <chr [2]>
#> 3 centipede, toothpick, pear, fruit <chr [4]> <chr [1]>

Created on 2018-08-07 by the reprex package (v0.2.0).

Here's an example using purrr

strings <- tibble(string = c("apple, orange, plum, tomato",
                         "plum, beat, pear, cactus",
                         "centipede, toothpick, pear, fruit"))

fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))

extract_if_exists <- function(string_to_parse, pattern){
  extraction <- stringi::stri_extract_all_regex(string_to_parse, pattern)
  extraction <- unlist(extraction[!(is.na(extraction))])
  return(extraction)
}

strings %>%
  mutate(matches = map(string, extract_if_exists, fruits$fruit)) %>%
  mutate(matches = map(string, str_c, collapse=", ")) %>%
  unnest

Here is a base-R solution:

strings[["match"]] <- 
  sapply(
    strsplit(strings[["string"]], ", "), 
    function(x) {
      paste(x[x %in% fruits[["fruit"]]], collapse = ", ")
    }
  )

Resulting in:

  string                            match              
  <chr>                             <chr>              
1 apple, orange, plum, tomato       apple, orange, plum
2 plum, beat, pear, cactus          plum, pear         
3 centipede, toothpick, pear, fruit pear               
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!