I have a vector of strings like this :
strings <- tibble(string = c("apple, orange, plum, tomato",
"plum, beat, pear, cactus",
"centipede, toothpick, pear, fruit"))
And I have a vector of fruit:
fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))
What I'd like is a data.frame/tibble with the original strings
data.frame with a second list or character column of all the fruit contained in that original column. Something like this.
strings <- tibble(string = c("apple, orange, plum, tomato",
"plum, beat, pear, cactus",
"centipede, toothpick, pear, fruit"),
match = c("apple, orange, plum",
"plum, pear",
"pear")
)
I've tried str_extract(strings, fruits)
and I get a list where everything is blank along with the warning:
Warning message:
In stri_detect_regex(string, pattern, opts_regex = opts(pattern)):
longer object length is not a multiple of shorter object length
I've tried str_extract_all(strings, paste0(fruits, collapse = "|"))
and I get and I get the same warning message.
I've looked at this Find matches of a vector of strings in another vector of strings, but that doesn't seem to help here.
Any help would be greatly appreciated.
Here is one option. First we split each row of the string
column into separate strings (right now "apple, orange, plum, tomato"
is all one string). Then we compare the list of strings to the contents of the fruits$fruit
column and store a list of the matching values in the new fruits
column.
library("tidyverse")
strings <- tibble(
string = c(
"apple, orange, plum, tomato",
"plum, beat, pear, cactus",
"centipede, toothpick, pear, fruit"
)
)
fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))
strings %>%
mutate(str2 = str_split(string, ", ")) %>%
rowwise() %>%
mutate(fruits = list(intersect(str2, fruits$fruit)))
#> Source: local data frame [3 x 3]
#> Groups: <by row>
#>
#> # A tibble: 3 x 3
#> string str2 fruits
#> <chr> <list> <list>
#> 1 apple, orange, plum, tomato <chr [4]> <chr [3]>
#> 2 plum, beat, pear, cactus <chr [4]> <chr [2]>
#> 3 centipede, toothpick, pear, fruit <chr [4]> <chr [1]>
Created on 2018-08-07 by the reprex package (v0.2.0).
Here's an example using purrr
strings <- tibble(string = c("apple, orange, plum, tomato",
"plum, beat, pear, cactus",
"centipede, toothpick, pear, fruit"))
fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))
extract_if_exists <- function(string_to_parse, pattern){
extraction <- stringi::stri_extract_all_regex(string_to_parse, pattern)
extraction <- unlist(extraction[!(is.na(extraction))])
return(extraction)
}
strings %>%
mutate(matches = map(string, extract_if_exists, fruits$fruit)) %>%
mutate(matches = map(string, str_c, collapse=", ")) %>%
unnest
Here is a base-R solution:
strings[["match"]] <-
sapply(
strsplit(strings[["string"]], ", "),
function(x) {
paste(x[x %in% fruits[["fruit"]]], collapse = ", ")
}
)
Resulting in:
string match
<chr> <chr>
1 apple, orange, plum, tomato apple, orange, plum
2 plum, beat, pear, cactus plum, pear
3 centipede, toothpick, pear, fruit pear
来源:https://stackoverflow.com/questions/51733851/how-do-i-extract-appearances-of-a-vector-of-strings-in-another-vector-of-strings