How do I extract appearances of a vector of strings in another vector of strings using R?

╄→尐↘猪︶ㄣ 提交于 2019-12-23 03:25:16

问题


I have a vector of strings like this :

strings <- tibble(string = c("apple, orange, plum, tomato",
                             "plum, beat, pear, cactus",
                             "centipede, toothpick, pear, fruit"))

And I have a vector of fruit:

fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))

What I'd like is a data.frame/tibble with the original strings data.frame with a second list or character column of all the fruit contained in that original column. Something like this.

strings <- tibble(string = c("apple, orange, plum, tomato",
                             "plum, beat, pear, cactus",
                             "centipede, toothpick, pear, fruit"),
                   match = c("apple, orange, plum",
                             "plum, pear",
                             "pear")
                  )

I've tried str_extract(strings, fruits) and I get a list where everything is blank along with the warning:

Warning message:
In stri_detect_regex(string, pattern, opts_regex = opts(pattern)):
longer object length is not a multiple of shorter object length

I've tried str_extract_all(strings, paste0(fruits, collapse = "|")) and I get and I get the same warning message.

I've looked at this Find matches of a vector of strings in another vector of strings, but that doesn't seem to help here.

Any help would be greatly appreciated.


回答1:


Here is one option. First we split each row of the string column into separate strings (right now "apple, orange, plum, tomato" is all one string). Then we compare the list of strings to the contents of the fruits$fruit column and store a list of the matching values in the new fruits column.

library("tidyverse")
strings <- tibble(
  string = c(
    "apple, orange, plum, tomato",
    "plum, beat, pear, cactus",
    "centipede, toothpick, pear, fruit"
  )
)

fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))

strings %>%
  mutate(str2 = str_split(string, ", ")) %>%
  rowwise() %>%
  mutate(fruits = list(intersect(str2, fruits$fruit)))
#> Source: local data frame [3 x 3]
#> Groups: <by row>
#> 
#> # A tibble: 3 x 3
#>   string                            str2      fruits   
#>   <chr>                             <list>    <list>   
#> 1 apple, orange, plum, tomato       <chr [4]> <chr [3]>
#> 2 plum, beat, pear, cactus          <chr [4]> <chr [2]>
#> 3 centipede, toothpick, pear, fruit <chr [4]> <chr [1]>

Created on 2018-08-07 by the reprex package (v0.2.0).




回答2:


Here's an example using purrr

strings <- tibble(string = c("apple, orange, plum, tomato",
                         "plum, beat, pear, cactus",
                         "centipede, toothpick, pear, fruit"))

fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))

extract_if_exists <- function(string_to_parse, pattern){
  extraction <- stringi::stri_extract_all_regex(string_to_parse, pattern)
  extraction <- unlist(extraction[!(is.na(extraction))])
  return(extraction)
}

strings %>%
  mutate(matches = map(string, extract_if_exists, fruits$fruit)) %>%
  mutate(matches = map(string, str_c, collapse=", ")) %>%
  unnest



回答3:


Here is a base-R solution:

strings[["match"]] <- 
  sapply(
    strsplit(strings[["string"]], ", "), 
    function(x) {
      paste(x[x %in% fruits[["fruit"]]], collapse = ", ")
    }
  )

Resulting in:

  string                            match              
  <chr>                             <chr>              
1 apple, orange, plum, tomato       apple, orange, plum
2 plum, beat, pear, cactus          plum, pear         
3 centipede, toothpick, pear, fruit pear               


来源:https://stackoverflow.com/questions/51733851/how-do-i-extract-appearances-of-a-vector-of-strings-in-another-vector-of-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!