问题
Is there any way to replace range of numbers wih single numbers in a character string? Number can range from n-n, most probably around 1-15, 4-10 ist also possible.
the range could be indicated with a) -
a <- "I would like to buy 1-3 cats"
or with a word b) for example: to, bis, jusqu'à
b <- "I would like to buy 1 jusqu'à 3 cats"
The results should look like
"I would like to buy 1,2,3 cats"
I found this: Replace range of numbers with certain number but could not really use it in R.
回答1:
gsubfn
in the gsubfn package is like gsub
but instead of replacing the match with a replacement string it allows the user to specify a function (possibly in formula notation as done here). It then passes the matches to the capture groups in the regular expression, i.e. the matches to the parenthesized parts of the regular expression, as separate arguments and replaces the entire match with the output of the function. Thus we match "(\\d+)(-| to | bis | jusqu'à )(\\d+)"
which results in three capture groups so 3 arguments to the function. In the function we use seq
with the first and third of these. Note that seq
can take character arguments and interpret them as numeric so we did not have to convert the arguments to numeric.
Thus we get this one-liner:
library(gsubfn)
s <- c(a, b) # test input strings
gsubfn("(\\d+)(-| to | bis | jusqu'à )(\\d+)", ~ paste(seq(..1, ..3), collapse = ","), s)
giving:
[1] "I would like to buy 1,2,3 cats" "I would like to buy 1,2,3 cats"
回答2:
Not the most efficient, but ...
s <- c("I would like to buy 1-3 cats",
"I would like to buy 1 jusqu'à 3 cats",
"foo 22-33",
"quux 11-3 bar")
gre <- gregexpr("([0-9]+(-| to | bis | jusqu'à )[0-9]+)", s)
gre2 <- gregexpr('[0-9]+', regmatches(s, gre))
regmatches(s, gre) <- lapply(regmatches(regmatches(s, gre), gre2),
function(a) paste(do.call(seq, as.list(as.integer(a))), collapse = ","))
s
# [1] "I would like to buy 1,2,3 cats" "I would like to buy 1,2,3 cats"
# [3] "foo 22,23,24,25,26,27,28,29,30,31,32,33" "quux 11,10,9,8,7,6,5,4,3 bar"
回答3:
This is, in fact, a little tricky, unless someone has already written a package that does this (that I'm not aware of).
a <- "I would like to buy 1-3 cats"
pos <- unlist(gregexpr("\\d+\\D+", a))
a_split <- unlist(strsplit(a, ""))
replacement <- paste(seq.int(a_split[pos[1]], a_split[pos[2]]), collapse = ",")
gsub("\\d+\\D+\\d+", replacement, a)
# [1] "I would like to buy 1,2,3 cats"
EDIT: To show that the same solution works for arbitrary non digit characters between two numbers:
b <- "I would like to buy 1 jusqu'à 3 cats"
pos_b <- unlist(gregexpr("\\d+\\D+", b))
b_split <- unlist(strsplit(b, ""))
replacement <- paste(seq.int(b_split[pos_b[1]], b_split[pos_b[2]]), collapse = ",")
gsub("\\d+\\D+\\d+", replacement, b)
# [1] "I would like to buy 1,2,3 cats"
You can add arbitrary requirements for the run of nondigit characters if you'd like. If you need help with that, just share what the limits on the words or symbols that are between the numbers are!
来源:https://stackoverflow.com/questions/49344140/replace-range-of-numbers-with-single-numbers-in-a-character-string