Following up on this question, I have another example where I cannot use the accepted answer.
Again, I want to find each of the exact group
elements in the
Try this from the stringr package. The "coll" option implements "human readable collation rules" which helps you match things that look identical, but for some reason, R resists matching them at first:
> library(stringr)
> str_detect(labs,coll(groups))
[1] TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE
TRUE FALSE FALSE
[16] TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
+
is a special character in regex. You will need "\+" to escape the special character.
new_group <- gsub("\\+",replacement = "\\\\+",x =groups)
Also, "|" in grep serves like "or".
new_group1 <- paste0(new_group,collapse = "|")
grep(pattern = new_group1,x = labs,value = T)
Try
lapply(groups, function(g)
grep(gsub("\\+", "\\\\+", paste0(g, "$")), labs, value = TRUE))
# [[1]]
# [1] "Beijing -- T0 -- BC-89 + CN"
# [2] "Beijing -- T24 -- BC-89 + CN"
# [3] "Beijing -- T0 -- BC-89 + CN"
# [4] "Zhangjiakou -- T0 -- BC-89 + CN"
# [5] "Beijing -- T0 -- BC-89 + CN"
# [6] "Beijing -- T0 -- BC-89 + CN"
# [7] "Beijing -- T24 -- BC-89 + CN"
# [8] "Beijing -- T24 -- BC-89 + CN"
# [9] "Zhangjiakou -- T0 -- BC-89 + CN"
# [10] "Zhangjiakou -- T0 -- BC-89 + CN"
# [11] "Zhangjiakou -- T24 -- BC-89 + CN"
# [12] "Zhangjiakou -- T24 -- BC-89 + CN"
#
# [[2]]
# [1] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [2] "Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [3] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [4] "Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [5] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [6] "Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [7] "Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [8] "Zhangjiakou -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"
#
# [[3]]
# [1] "Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"
# [2] "Beijing -- T24 -- BC-89 with 2% Puricare + 5% Merquat + CN"
# [3] "Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"
# [4] "Zhangjiakou -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"
The problem with your approach is that, e.g., groups[1]
is "BC-89 + CN"
, which contains +
, having particular meaning in regular expressions. Given only this, adding fixed = TRUE
in grep
would fix the issue, but then $
would lose its effect. So what I did is escaping +
in the group names first.
Alternatively, and relating to your linked answer, you could do
lapply(groups, function(g)
grep(paste0(g, "$"), paste0(labs, "$"), value = TRUE, fixed = TRUE))