Regex in R: replace only part of a pattern

前端 未结 1 1500
南笙
南笙 2020-12-21 06:13
s <- \"YXABCDXABCDYX\"

I want to use a regular expression to return ABCDABCD, i.e. 4 characters on each side of central \"X\"

相关标签:
1条回答
  • 2020-12-21 06:39

    Your regex "([A-Z]{4})(X)([A-Z]{4})" won't match your string since you have characters before the first capture group ([A-Z]{4}), so you can add .* to match any character (.) 0 or more times (*) until your first capture group.

    You can reference the groups in gsub, for example, using \\n where n is the nth capture group

    s <- "YXABCDXABCDYX"
    
    gsub('.*([A-Z]{4})(X)([A-Z]{4}).*', '\\1\\3', s)
    # [1] "ABCDABCD"
    

    which is basically matching the entire string and replacing it with whatever was captured in groups 1 and 3 and pasting that together.

    Another way would be to use (?i) which is case-insensitive matching along with [a-z] or \\w

    gsub('(?i).*(\\w{4})(x)(\\w{4}).*', '\\1\\3', s)
    # [1] "ABCDABCD"
    

    Or gsub('.*(.{4})X(.{4}).*', '\\1\\2', s) if you like dots

    0 讨论(0)
提交回复
热议问题