gsub return an empty string when no match is found

后端 未结 7 1608
太阳男子
太阳男子 2021-01-12 04:35

I\'m using the gsub function in R to return occurrences of my pattern (reference numbers) on a list of text. This works great unless no match is found, in whic

相关标签:
7条回答
  • 2021-01-12 05:02

    Another simple way is to use gsub but specify you want '' in a new function

    noFalsePositives <- function(a,b,x) {
      return(ifelse(gsub(a,b,x)==x,'',gsub(a,b,x)))
    }
    # usage
    noFalsePositives(".*(Ref. (\\d+)).*", "\\1", data)
    # [1] "Ref. 12" "" 
    
    0 讨论(0)
  • 2021-01-12 05:04

    You might try embedding grep( ..., value = T) in that function.

    data <- list("a sentence with citation (Ref. 12)",
             "another sentence without reference")
    
    unlist( sapply(data, function(x) { 
      x <- gsub(".*(Ref. (\\d+)).*", "\\1", x)
      grep( "Ref\\.", x, value = T )
      } ) )
    

    Kind of bulky but it works? It also removes the empty 2nd reference.

    0 讨论(0)
  • 2021-01-12 05:14

    according to the documentation, this is a feature of gsub it returns the input string if there are no matches to the supplied pattern matches returns the entire string.

    here, I use the function grepl first to return a logical vector of the presence/absence of the pattern in the given string:

    ifelse(grepl(".*(Ref. (\\d+)).*", data), 
          gsub(".*(Ref. (\\d+)).*", "\\1", data), 
          "")
    

    embedding this in a function:

    mygsub <- function(x){
         ans <- ifelse(grepl(".*(Ref. (\\d+)).*", x), 
                  gsub(".*(Ref. (\\d+)).*", "\\1", x), 
                  "")
         return(ans)
    }
    
    mygsub(data)
    
    0 讨论(0)
  • 2021-01-12 05:18
    xs <- sapply(data, function(x) gsub(".*(Ref. (\\d+)).*", "\\1", x))
    xs[xs==data] <- ""
    xs
    #[1] "Ref. 12" ""       
    
    0 讨论(0)
  • 2021-01-12 05:18

    Try strapplyc in the gsubfn package:

    library(gsubfn)
    
    L <- fn$sapply(unlist(data), ~ strapplyc(x, "Ref. \\d+"))
    unlist(fn$sapply(L, ~ ifelse(length(x), x, "")))
    

    which gives this:

    a sentence with citation (Ref. 12) another sentence without reference 
                             "Ref. 12"                                 "" 
    

    If you don't mind list output then you could just use L and forget about the last line of code. Note that the fn$ prefix turns the formula arguments of the function its applied to into function calls so the first line of code could be written without fn as sapply(unlist(data), function(x) strapplyc(x, "Ref x. \\d+")) .

    0 讨论(0)
  • 2021-01-12 05:18

    based on @joran 's answer

    function:

    extract_matches <- function(x,pattern,replacement,replacement_nomatch=""){
        x <- gsub(pattern,replacement,x)
        x[-grep(pattern,x,value = FALSE)] <- replacement_nomatch
        x
    }
    

    usage:

    data <- list("with citation (Ref. 12)", "without reference", "")
    extract_matches(data,  ".*(Ref. (\\d+)).*", "\\1")
    
    0 讨论(0)
提交回复
热议问题