问题
I am relatively new in R.
I have a dataframe locs
that has 1 variable V1
and looks like:
V1
edmonton general hospital
cardiovascular institute, hospital san carlos, madrid spain
hospital of santa maria, lisbon, portugal
and another dataframe cities
that has two variables that look like this:
city country
edmonton canada
san carlos spain
los angeles united states
santa maria united states
tokyo japan
madrid spain
santa maria portugal
lisbon portugal
I want to create two new variables in locs
that relates any string match of V1
within city
so that locs
looks like this:
V1 city country
edmonton general hospital edmonton canada
hospital san carlos, madrid spain san carlos, madrid spain
hospital of santa maria, lisbon, portugal santa maria, lisbon portugal, united states
A few things to note: V1
may have multiple country names. Also, if there is a repeat country (for instance, both san carlos and madrid are in spain), then I only want one instance of the country.
Please advise.
Thanks.
回答1:
A solution using tidyverse
and stringr
. locs2
is the final output.
library(tidyverse)
library(stringr)
locs2 <- locs %>%
rowwise() %>%
mutate(city = list(str_match(V1, cities$city))) %>%
unnest() %>%
drop_na(city) %>%
left_join(cities, by = "city") %>%
group_by(V1) %>%
summarise_all(funs(toString(sort(unique(.)))))
Result
locs2 %>% as.data.frame()
V1 city country
1 cardiovascular institute, hospital san carlos, madrid spain madrid, san carlos spain
2 edmonton general hospital edmonton canada
3 hospital of santa maria, lisbon, portugal lisbon, santa maria portugal, united states
DATA
library(tidyverse)
locs <- data_frame(V1 = c("edmonton general hospital",
"cardiovascular institute, hospital san carlos, madrid spain",
"hospital of santa maria, lisbon, portugal"))
cities <- read.table(text = "city country
edmonton canada
'san carlos' spain
'los angeles' 'united states'
'santa maria' 'united states'
tokyo japan
madrid spain
'santa maria' portugal
lisbon portugal",
header = TRUE, stringsAsFactors = FALSE)
来源:https://stackoverflow.com/questions/46592850/finding-all-string-matches-from-another-dataframe-in-r