问题
I have a vector of words in R:
words = c("Awesome","Loss","Good","Bad")
And I have the following dataframe in R:
df <- data.frame(ID = c(1,2,3),
Response = c("Today is an awesome day",
"Yesterday was a bad day,but today it is good",
"I have losses today"))
What I want to do is words that are exact matching in Response column should be extracted and inserted into new column in dataframe. Final output should look like this
ID Response Match
1 Today is an awesome day Awesome
2 Yesterday was a bad day Bad,Good
,but today it is good
3 I have losses today NA
I used the following code:
extract the list of matching words
x <- sapply(words, function(x) grepl(tolower(x), tolower(df$Response)))
paste the matching words together
df$Words <- apply(x, 1, function(i) paste0(names(i)[i], collapse = ","))
But it is providing the match, but not the exact. Please help.
回答1:
If you use anchors in your words
vector, you will ensure exact matches: ^ asserts that you're at the start, $ that you're at the end of a word. So:
words = c("Awesome","^Loss$","Good","Bad")
Then use your code:
x <- sapply(words, function(x) grepl(tolower(x), tolower(df$Response)))
df$Words <- apply(x, 1, function(i) paste0(names(i)[i], collapse = ","))
which gives:
> df
ID Response Words
1 1 Today is an awesome day Awesome
2 2 Yesterday was a bad day,but today it is good Good,Bad
3 3 I have losses today
To turn blanks to NA
:
df$Words[df$Words == ""] <- NA
回答2:
We can use str_extract_all
library(stringr)
library(dplyr)
library(purrr)
df %>%
mutate(Words = map_chr(str_extract_all(Response, str_c("
(?i)\\b(", str_c(words, collapse="|"), ")\\b")), toString))
# ID Response Words
#1 1 Today is an awesome day awesome
#2 2 Yesterday was a bad day,but today it is good bad, good
#3 3 I have losses today
data
words <- c("Awesome","Loss","Good","Bad")
回答3:
Change the first *apply
function to a two lines function. If the regex becomes "\\bword\\b"
then it captures the word surrounded by boundaries.
x <- sapply(words, function(x) {
y <- paste0("\\b", x, "\\b")
grepl(tolower(y), tolower(df$Response))
})
Now run the second apply
as posted in the question.
df$Words <- apply(x, 1, function(i) paste0(names(i)[i], collapse = ","))
df
# ID Response Words
#1 1 Today is an awesome day Awesome
#2 2 Yesterday was a bad day,but today it is good Good,Bad
#3 3 I have losses today
As for the NA
's, I will use function is.na<-
.
is.na(df$Words) <- df$Words == ""
Data.
df <- read.table(text = "
ID Response
1 'Today is an awesome day'
2 'Yesterday was a bad day,but today it is good'
3 'I have losses today'
", header = TRUE)
words <- c("Awesome","Loss","Good","Bad")
来源:https://stackoverflow.com/questions/61160426/exact-matching-text-with-dataframe-column-in-r