Extract part of string between two different patterns

前端 未结 4 1959
傲寒
傲寒 2020-12-30 15:04

I try to use stringr package to extract part of a string, which is between two particular patterns.

For example, I have:

my.string <         


        
相关标签:
4条回答
  • 2020-12-30 15:25

    I do not know whether and how this is possible with functions provided by stringr but you can also use base regexpr and substring:

    pattern <- paste0("(?<=", left.border, ")[a-z]+(?=", right.border, ")")
    # "(?<=nana)[a-z]+(?=baba)"
    
    rx <- regexpr(pattern, text=my.string, perl=TRUE)
    # [1] 5
    # attr(,"match.length")
    # [1] 6
    
    substring(my.string, rx, rx+attr(rx, "match.length")-1)
    # [1] "qwerty"
    
    0 讨论(0)
  • 2020-12-30 15:25

    I would use str_match from stringr: "str_match extracts capture groups formed by () from the first match. It returns a character matrix with one column for the complete match and one column for each group." ref

    str_match(my.string, paste(left.border, '(.+)', right.border, sep=''))[,2]
    

    The code above creates a regular expression with paste concatenating the capture group (.+) that captures 1 or more characters, with left and right borders (no spaces between strings).

    A single match is assumed. So, [,2] selects the second column from the matrix returned by str_match.

    0 讨论(0)
  • 2020-12-30 15:26

    In base R you can use gsub. The parentheses in the pattern create numbered capturing groups. Here we select the second group in the replacement, i.e. the group between the borders. The . matches any character. The * means that there is zero or more of the preceeding element

    gsub(pattern = "(.*nana)(.*)(baba.*)",
         replacement = "\\2",
         x = "xxxnanaRisnicebabayyy")
    # "Risnice"
    
    0 讨论(0)
  • 2020-12-30 15:39

    You can use the package unglue:

    library(unglue)
    my.string <- "nanaqwertybaba"
    unglue_vec(my.string, "nana{res}baba")
    #> [1] "qwerty"
    
    0 讨论(0)
提交回复
热议问题