问题
I'm trying to extract certain records from a dataframe with grepl.
This is based on the comparison between two columns Result and Names. This variable is build like this "WordNumber" but for the same word I have multiple numbers (more than 30), so when I use the grepl expression to get for instance Word1 I get also results that I would like to avoid, like Word12.
Any ideas on how to fix this?
Names <- c("Word1")
colnames(Names) <- name
Results <- c("Word1", "Word11", "Word12", "Word15")
Records <- c("ThisIsTheResultIWant", "notThis", "notThis", "notThis")
Relationships <- data.frame(Results, Records)
Relationships <- subset(Relationships, grepl(paste(Names$name, collapse = "|"), Relationships$Results))
This doesn't work, if I use fixed = TRUE
than it doesn't return any result at all (which is weird). I have also tried concatenating the name part with other numbers like this, but with no success:
Relationships <- subset(Relationships, grepl(paste(paste(Names$name, '3', sep = ""), collapse = "|"), Relationships$Results))
Since I'm concatenating I'm not really sure of how to use the \b to enforce a full match.
Any suggestions?
回答1:
I think this is just:
Relationships[Relationships$Results==Names,]
If you end up doing ^Word1$
you're just doing a straight subset.
If you have multiple names, then instead use:
Relationships[Relationships$Results %in% Names,]
回答2:
In addition to @Richard's solution, there are multiple ways to enforce a full match.
\b
"\b" is an anchor to identify word before/after pattern
> grepl("\\bWord1\\b",c("Word1","Word2","Word12"))
[1] TRUE FALSE FALSE
\< & \>
"\<" is an escape sequence for the beginning of a word, and ">" is used for end
> grepl("\\<Word1\\>",c("Word1","Word2","Word12"))
[1] TRUE FALSE FALSE
回答3:
Use ^ to match the start of the string and $ to match the end of the string
Names <-c('^Word1$')
Or, to apply to the entire names vector
Names <-paste0('^',Names,'$')
来源:https://stackoverflow.com/questions/46153832/exact-match-with-grepl-r