Remove text inside brackets, parens, and/or braces

前端 未结 4 512
情话喂你
情话喂你 2020-12-16 22:22

I am in need of a function that extracts any type of bracket ie (), [], {} and the information in between. I created it and get it to do what I want but I get an annoying w

相关标签:
4条回答
  • Suppose that brackets are not nested and that we have this test data:

    x <- c("a (bb) [ccc]{d}e", "x[a]y")
    

    Then using strapply in gsubfn we have this two-line solution which first translates all parentheses and square brackets to brace brackets and then processes that:

    library(gsubfn)    
    
    xx <- chartr("[]()", "{}{}", x)
    s <- strapply(xx, "{([^}]*)}", c)
    

    The result of the above is the following list:

    > s
    [[1]]
    [1] "bb"  "ccc" "d"  
    
    [[2]]
    [1] "a"
    
    0 讨论(0)
  • 2020-12-16 23:25

    Maybe this function is a little more straight-forward? Or at least more compact.

    bracketXtract <-
        function(txt, br = c("(", "[", "{", "all"), with=FALSE)
    {
        br <- match.arg(br)
        left <-        # what pattern are we looking for on the left?
            if ("all" == br) "\\(|\\{|\\["
            else sprintf("\\%s", br)
        map <-         # what's the corresponding pattern on the right?
            c(`\\(`="\\)", `\\[`="\\]", `\\{`="\\}",
              `\\(|\\{|\\[`="\\)|\\}|\\]")
        fmt <-         # create the appropriate regular expression
            if (with) "(%s).*?(%s)"
            else "(?<=%s).*?(?=%s)"
        re <- sprintf(fmt, left, map[left])
        regmatches(txt, gregexpr(re, txt, perl=TRUE))    # do it!
    }
    

    No need to lapply; the regular expression functions are vectorized in that way. This fails with nested parentheses; likely regular expressions won't be a good solution if that's important. Here we are in action:

    > txt <- c("I love chicken [unintelligible]!",
    +          "Me too! (laughter) It's so good.[interupting]",
    +          "Yep it's awesome {reading}.",
    +          "Agreed.")
    > bracketXtract(txt, "all")
    [[1]]
    [1] "unintelligible"
    
    [[2]]
    [1] "laughter"    "interupting"
    
    [[3]]
    [1] "reading"
    
    [[4]]
    character(0)
    

    This fits without trouble into a data.frame.

    > examp2 <- data.frame(var1=1:4)
    > examp2$text <- c("I love chicken [unintelligible]!",
    +                  "Me too! (laughter) It's so good.[interupting]",
    +                  "Yep it's awesome {reading}.", "Agreed.")
    > examp2$text2<-bracketXtract(examp2$text, 'all')
    > examp2
      var1                                          text                 text2
    1    1              I love chicken [unintelligible]!        unintelligible
    2    2 Me too! (laughter) It's so good.[interupting] laughter, interupting
    3    3                   Yep it's awesome {reading}.               reading
    4    4                                       Agreed.                      
    

    The warning you were seeing has to do with trying to stick a matrix into a data frame. I think the answer is "don't do that".

    > df = data.frame(x=1:2)
    > df$y = matrix(list(), 2, 2)
    > df
      x    y
    1 1 NULL
    2 2 NULL
    Warning message:
    In format.data.frame(x, digits = digits, na.encode = FALSE) :
      corrupt data frame: columns will be truncated or padded with NAs
    
    0 讨论(0)
  • 2020-12-16 23:25

    My thought had been to make 6 (implicitly vectorized) helper functions, but I will be studying Martin's code instead, since he is much better at this than I:

    rm.curlybkt.no <-function(x) gsub("(\\{).*(\\})", "\\1\\2", x, perl=TRUE)
    rm.rndbkt.no <-  function(x) gsub("(\\().*(\\))", "\\1\\2", x, perl=TRUE)
    rm.sqrbkt.no <-  function(x) gsub("(\\[).*(\\])", "\\1\\2", x, perl=TRUE)
    
    rm.rndbkt.in <- function(x) gsub("\\(.*\\)", "", x)
    rm.curlybkt.in <- function(x) gsub("\\{.*\\}", "", x)
    rm.sqrbkt.in   <- function(x) gsub("\\[.*\\]", "", x)
    
    0 讨论(0)
  • 2020-12-16 23:26

    Give this a shot. I prefer the stringr package! :)

    bracketXtract <- function(string, bracket = "all", include.bracket = TRUE){
      # Load stringr package
      require(stringr)
    
      # Regular expressions for your brackets
      rgx = list(square = "\\[\\w*\\]", curly  = "\\{\\w*\\}", round  = "\\(\\w*\\)")
      rgx['all'] = sprintf('(%s)|(%s)|(%s)', rgx$square, rgx$curly, rgx$round)
    
      # Ensure you have the correct bracket name
      stopifnot(bracket %in% names(rgx))
    
      # Find your matches
      matches = str_extract_all(string, pattern = rgx[[bracket]])[[1]]
    
      # Remove brackets from results if needed
      if(!include.bracket) 
        matches = sapply(matches, function(m) substr(m, 2, nchar(m)-1))
    
      unname(matches)
    }
    
    
    
    j <- "What kind of cheese isn't your cheese? {wonder} Nacho cheese! [groan] (Laugh)"  
    bracketXtract(j)
    # [1] "{wonder}" "[groan]"  "(Laugh)" 
    bracketXtract(j, bracket = "square")
    # [1] "[groan]"
    bracketXtract(j, include.bracket = F)
    # [1] "wonder" "groan"  "Laugh" 
    
    0 讨论(0)
提交回复
热议问题