R grep regular expression using elements in a vector (FOLLOW UP)

后端 未结 3 1442
被撕碎了的回忆
被撕碎了的回忆 2021-01-26 23:58

Following up on this question, I have another example where I cannot use the accepted answer.

Again, I want to find each of the exact group elements in the

3条回答
  •  盖世英雄少女心
    2021-01-27 00:25

    Try

    lapply(groups, function(g)
      grep(gsub("\\+", "\\\\+", paste0(g, "$")), labs, value = TRUE))
    # [[1]]
    # [1] "Beijing -- T0 -- BC-89 + CN"     
    # [2] "Beijing -- T24 -- BC-89 + CN"    
    # [3] "Beijing -- T0 -- BC-89 + CN"     
    # [4] "Zhangjiakou -- T0 -- BC-89 + CN" 
    # [5] "Beijing -- T0 -- BC-89 + CN"     
    # [6] "Beijing -- T0 -- BC-89 + CN"     
    # [7] "Beijing -- T24 -- BC-89 + CN"    
    # [8] "Beijing -- T24 -- BC-89 + CN"    
    # [9] "Zhangjiakou -- T0 -- BC-89 + CN" 
    # [10] "Zhangjiakou -- T0 -- BC-89 + CN" 
    # [11] "Zhangjiakou -- T24 -- BC-89 + CN"
    # [12] "Zhangjiakou -- T24 -- BC-89 + CN"
    # 
    # [[2]]
    # [1] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"     
    # [2] "Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"    
    # [3] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"     
    # [4] "Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC" 
    # [5] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"     
    # [6] "Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"    
    # [7] "Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC" 
    # [8] "Zhangjiakou -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"
    # 
    # [[3]]
    # [1] "Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"    
    # [2] "Beijing -- T24 -- BC-89 with 2% Puricare + 5% Merquat + CN"   
    # [3] "Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"    
    # [4] "Zhangjiakou -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"
    

    The problem with your approach is that, e.g., groups[1] is "BC-89 + CN", which contains +, having particular meaning in regular expressions. Given only this, adding fixed = TRUE in grep would fix the issue, but then $ would lose its effect. So what I did is escaping + in the group names first.

    Alternatively, and relating to your linked answer, you could do

    lapply(groups, function(g)
      grep(paste0(g, "$"), paste0(labs, "$"), value = TRUE, fixed = TRUE))
    

提交回复
热议问题