问题
I have a vector of strings like this :
strings <- tibble(string = c("apple, orange, plum, tomato",
"plum, beat, pear, cactus",
"centipede, toothpick, pear, fruit"))
And I have a vector of fruit:
fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))
What I'd like is a data.frame/tibble with the original strings
data.frame with a second list or character column of all the fruit contained in that original column. Something like this.
strings <- tibble(string = c("apple, orange, plum, tomato",
"plum, beat, pear, cactus",
"centipede, toothpick, pear, fruit"),
match = c("apple, orange, plum",
"plum, pear",
"pear")
)
I've tried str_extract(strings, fruits)
and I get a list where everything is blank along with the warning:
Warning message:
In stri_detect_regex(string, pattern, opts_regex = opts(pattern)):
longer object length is not a multiple of shorter object length
I've tried str_extract_all(strings, paste0(fruits, collapse = "|"))
and I get and I get the same warning message.
I've looked at this Find matches of a vector of strings in another vector of strings, but that doesn't seem to help here.
Any help would be greatly appreciated.
回答1:
Here is one option. First we split each row of the string
column into separate strings (right now "apple, orange, plum, tomato"
is all one string). Then we compare the list of strings to the contents of the fruits$fruit
column and store a list of the matching values in the new fruits
column.
library("tidyverse")
strings <- tibble(
string = c(
"apple, orange, plum, tomato",
"plum, beat, pear, cactus",
"centipede, toothpick, pear, fruit"
)
)
fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))
strings %>%
mutate(str2 = str_split(string, ", ")) %>%
rowwise() %>%
mutate(fruits = list(intersect(str2, fruits$fruit)))
#> Source: local data frame [3 x 3]
#> Groups: <by row>
#>
#> # A tibble: 3 x 3
#> string str2 fruits
#> <chr> <list> <list>
#> 1 apple, orange, plum, tomato <chr [4]> <chr [3]>
#> 2 plum, beat, pear, cactus <chr [4]> <chr [2]>
#> 3 centipede, toothpick, pear, fruit <chr [4]> <chr [1]>
Created on 2018-08-07 by the reprex package (v0.2.0).
回答2:
Here's an example using purrr
strings <- tibble(string = c("apple, orange, plum, tomato",
"plum, beat, pear, cactus",
"centipede, toothpick, pear, fruit"))
fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))
extract_if_exists <- function(string_to_parse, pattern){
extraction <- stringi::stri_extract_all_regex(string_to_parse, pattern)
extraction <- unlist(extraction[!(is.na(extraction))])
return(extraction)
}
strings %>%
mutate(matches = map(string, extract_if_exists, fruits$fruit)) %>%
mutate(matches = map(string, str_c, collapse=", ")) %>%
unnest
回答3:
Here is a base-R solution:
strings[["match"]] <-
sapply(
strsplit(strings[["string"]], ", "),
function(x) {
paste(x[x %in% fruits[["fruit"]]], collapse = ", ")
}
)
Resulting in:
string match
<chr> <chr>
1 apple, orange, plum, tomato apple, orange, plum
2 plum, beat, pear, cactus plum, pear
3 centipede, toothpick, pear, fruit pear
来源:https://stackoverflow.com/questions/51733851/how-do-i-extract-appearances-of-a-vector-of-strings-in-another-vector-of-strings