I\'m using the gsub
function in R to return occurrences of my pattern (reference numbers) on a list of text. This works great unless no match is found, in whic
Another simple way is to use gsub but specify you want '' in a new function
noFalsePositives <- function(a,b,x) {
return(ifelse(gsub(a,b,x)==x,'',gsub(a,b,x)))
}
# usage
noFalsePositives(".*(Ref. (\\d+)).*", "\\1", data)
# [1] "Ref. 12" ""
You might try embedding grep( ..., value = T)
in that function.
data <- list("a sentence with citation (Ref. 12)",
"another sentence without reference")
unlist( sapply(data, function(x) {
x <- gsub(".*(Ref. (\\d+)).*", "\\1", x)
grep( "Ref\\.", x, value = T )
} ) )
Kind of bulky but it works? It also removes the empty 2nd reference.
according to the documentation, this is a feature of gsub
it returns the input string if there are no matches to the supplied pattern matches returns the entire string.
here, I use the function grepl
first to return a logical vector of the presence/absence of the pattern in the given string:
ifelse(grepl(".*(Ref. (\\d+)).*", data),
gsub(".*(Ref. (\\d+)).*", "\\1", data),
"")
embedding this in a function:
mygsub <- function(x){
ans <- ifelse(grepl(".*(Ref. (\\d+)).*", x),
gsub(".*(Ref. (\\d+)).*", "\\1", x),
"")
return(ans)
}
mygsub(data)
xs <- sapply(data, function(x) gsub(".*(Ref. (\\d+)).*", "\\1", x))
xs[xs==data] <- ""
xs
#[1] "Ref. 12" ""
Try strapplyc
in the gsubfn package:
library(gsubfn)
L <- fn$sapply(unlist(data), ~ strapplyc(x, "Ref. \\d+"))
unlist(fn$sapply(L, ~ ifelse(length(x), x, "")))
which gives this:
a sentence with citation (Ref. 12) another sentence without reference
"Ref. 12" ""
If you don't mind list output then you could just use L and forget about the last line of code. Note that the fn$
prefix turns the formula arguments of the function its applied to into function calls so the first line of code could be written without fn
as sapply(unlist(data), function(x) strapplyc(x, "Ref x. \\d+"))
.
based on @joran 's answer
extract_matches <- function(x,pattern,replacement,replacement_nomatch=""){
x <- gsub(pattern,replacement,x)
x[-grep(pattern,x,value = FALSE)] <- replacement_nomatch
x
}
data <- list("with citation (Ref. 12)", "without reference", "")
extract_matches(data, ".*(Ref. (\\d+)).*", "\\1")