问题
The stringr package has helpful str_replace()
and str_replace_all()
functions. For example
mystring <- "one fish two fish red fish blue fish"
str_replace(mystring, "fish", "dog") # replaces the first occurrence
str_replace_all(mystring, "fish", "dog") # replaces all occurrences
Awesome. But how do you
- Replace the 2nd occurrence of "fish"?
- Replace the last occurrence of "fish"?
- Replace the 2nd to last occurrence of "fish"?
回答1:
For the first and last, we can use stri_replace
from stringi
as it has the option
library(stringi)
stri_replace(mystring, fixed="fish", "dog", mode="first")
#[1] "one dog two fish red fish blue fish"
stri_replace(mystring, fixed="fish", "dog", mode="last")
#[1] "one fish two fish red fish blue dog"
The mode
can only have values 'first', 'last' and 'all'. So, other options are not in the default function. We may have to use regex
option to change it.
Using sub
, we can do the nth replacement of word
sub("^((?:(?!fish).)*fish(?:(?!fish).)*)fish",
"\\1dog", mystring, perl=TRUE)
#[1] "one fish two dog red fish blue fish"
Or we can use
sub('^((.*?fish.*?){2})fish', "\\1\\dog", mystring, perl=TRUE)
#[1] "one fish two fish red dog blue fish"
Just for easiness, we can create a function to do this
patfn <- function(n){
stopifnot(n>1)
sprintf("^((.*?\\bfish\\b.*?){%d})\\bfish\\b", n-1)
}
and replace the nth occurrence of 'fish' except the first one which can be easily done using sub
or the default option in str_replace
sub(patfn(2), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two dog red fish blue fish"
sub(patfn(3), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two fish red dog blue fish"
sub(patfn(4), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two fish red fish blue dog"
This should also work with str_replace
str_replace(mystring, patfn(2), "\\1dog")
#[1] "one fish two dog red fish blue fish"
str_replace(mystring, patfn(3), "\\1dog")
#[1] "one fish two fish red dog blue fish"
Based on the pattern/replacement mentioned above, we can create a new function to do most of the options
replacerFn <- function(String, word, rword, n){
stopifnot(n >0)
pat <- sprintf(paste0("^((.*?\\b", word, "\\b.*?){%d})\\b",
word,"\\b"), n-1)
rpat <- paste0("\\1", rword)
if(n >1) {
stringr::str_replace(String, pat, rpat)
} else {
stringr::str_replace(String, word, rword)
}
}
replacerFn(mystring, "fish", "dog", 1)
#[1] "one dog two fish red fish blue fish"
replacerFn(mystring, "fish", "dog", 2)
#[1] "one fish two dog red fish blue fish"
replacerFn(mystring, "fish", "dog", 3)
#[1] "one fish two fish red dog blue fish"
replacerFn(mystring, "fish", "dog", 4)
#[1] "one fish two fish red fish blue dog"
回答2:
A useful answer depends a lot on the string and what you know about it. With regex, one option is to build a regex that matches the whole line, but in different pieces, so you can put the pieces you like back in:
str_replace(mystring, '(^.*?fish.*?)(fish)(.*?fish.*)', '\\1dog\\3')
# [1] "one fish two dog red fish blue fish"
where the \\1
and \\3
in the replacement match the first and third parentheses captured, respectively. Note the lazy (ungreedy) quantifiers *?
, which are important so you don't overmatch.
You can do the same thing to match the third or fourth occurrence, of course:
str_replace(mystring, '(^.*?fish.*?fish.*?)(fish)(.*)', '\\1dog\\3')
# [1] "one fish two fish red dog blue fish"
str_replace(mystring, '(^.*?fish.*?fish.*?fish.*?)(fish)(.*?)', '\\1dog\\3')
# [1] "one fish two fish red fish blue dog"
This is not tremendously efficient, though. You can use quantifiers to repeat, but they make numbering the replacement groups a little confusing:
str_replace(mystring, '^((.*?fish.*?){3})(fish)(.*?)', '\\1dog\\4')
# [1] "one fish two fish red fish blue dog"
but if you make the repeated group non-capturing (?: ... )
, it makes more sense:
str_replace(mystring, '^((?:.*?fish.*?){3})(fish)(.*?)', '\\1dog\\3')
# [1] "one fish two fish red fish blue dog"
All of this is a lot of regex, though. A simpler option (depending on the context and how much you like regex, I suppose) may be to use strsplit
and then recombine, collapse
ing separately:
mystrlist <- strsplit(mystring, 'fish ')[[1]] # match the space so not the last "fish$"
paste0(c(mystrlist[1],
paste0(mystrlist[2:3], collapse = 'dog '),
mystrlist[4]),
collapse = 'fish ')
# [1] "one fish two dog red fish blue fish"
paste0(c(mystrlist[1:2],
paste0(mystrlist[3:4], collapse = 'dog ')),
collapse = 'fish ')
# [1] "one fish two fish red dog blue fish"
This doesn't work terribly well for the last word, of course, but the end-of-line regex token $
makes using str_replace
(or just sub
) really easy for that purpose:
sub('fish$', 'dog', mystring)
# [1] "one fish two fish red fish blue dog"
Bottom line: It depends a lot on the context what the best choice is, but there is not an extra parameter for which match to replace, sadly.
回答3:
stringr
is designed to work on character vectors. It does not have functions which allow to play within a vector element with any great level of detail. But an easy approach is to split the string into a character vector of subsets, apply stringr
functions on this vector (since this is what stringr
is really good at), then join the vector back into a single string. These steps, of course, can be turned into a function.
This method can be applied whenever something needs to be done within an individual string.
For the example provided here, the suitable subsets are individual words.
So, to replace the nth element of a string:
library(stringr)
replace_function <- function(string, word, rword, n) {
vec <- unlist(strsplit(string, " "))
vec[str_which(vec, word)[n]] <- rword
str_c(vec, collapse = " ")
}
replace_function(mystring, "fish", "dog", 1)
[1] "one dog two fish red fish blue fish"
replace_function(mystring, "fish", "dog", 2)
[1] "one fish two dog red fish blue fish"
To replace the nth from last element is easy by adding rev()
:
replace_end_function <- function(string, word, rword, n) {
vec <- unlist(strsplit(string, " "))
vec[rev(str_which(vec, word))[n]] <- rword
str_c(vec, collapse = " ")
}
replace_end_function(mystring, "fish", "dog", 1)
[1] "one fish two fish red fish blue dog"
replace_end_function(mystring, "fish", "dog", 2)
[1] "one fish two fish red dog blue fish"
And to replace the nth element to the last element:
replace_end_function <- function(string, word, rword, n) {
vec <- unlist(strsplit(string, " "))
vec[str_which(vec, word)[n:length(str_which(vec, word))]] <- rword
str_c(vec, collapse = " ")
}
replace_end_function(mystring, "fish", "dog", 1)
[1] "one dog two dog red dog blue dog"
replace_end_function(mystring, "fish", "dog", 2)
[1] "one fish two dog red dog blue dog"
replace_end_function(mystring, "fish", "dog", 3)
[1] "one fish two fish red dog blue dog"
replace_end_function(mystring, "fish", "dog", 4)
[1] "one fish two fish red fish blue dog"
Note that this answer does not use str_replace()
, as the OP had asked, because, as the OP noted, str_replace()
only works on the 1st element of a vector and str_replace_all()
works on all of them. So they are not the most appropriate functions within the stringr
package to answer this question: indexing with the result of str_which()
is much more suitable (once the individual string has been split into a vector of strings of course).
来源:https://stackoverflow.com/questions/36368712/how-to-use-stringrs-replace-all-function-to-replace-specific-matches-in-a-str