How to use stringr's replace_all() function to replace specific matches in a string

落爺英雄遲暮 提交于 2019-12-23 10:51:54

问题


The stringr package has helpful str_replace() and str_replace_all() functions. For example

mystring <- "one fish two fish red fish blue fish"

str_replace(mystring, "fish", "dog") # replaces the first occurrence
str_replace_all(mystring, "fish", "dog") # replaces all occurrences

Awesome. But how do you

  1. Replace the 2nd occurrence of "fish"?
  2. Replace the last occurrence of "fish"?
  3. Replace the 2nd to last occurrence of "fish"?

回答1:


For the first and last, we can use stri_replace from stringi as it has the option

 library(stringi)
 stri_replace(mystring, fixed="fish", "dog", mode="first")
 #[1] "one dog two fish red fish blue fish"

 stri_replace(mystring, fixed="fish", "dog", mode="last")
 #[1] "one fish two fish red fish blue dog"

The mode can only have values 'first', 'last' and 'all'. So, other options are not in the default function. We may have to use regex option to change it.

Using sub, we can do the nth replacement of word

sub("^((?:(?!fish).)*fish(?:(?!fish).)*)fish", 
           "\\1dog", mystring, perl=TRUE)
#[1] "one fish two dog red fish blue fish"

Or we can use

 sub('^((.*?fish.*?){2})fish', "\\1\\dog", mystring, perl=TRUE)
 #[1] "one fish two fish red dog blue fish"

Just for easiness, we can create a function to do this

patfn <- function(n){
 stopifnot(n>1)
 sprintf("^((.*?\\bfish\\b.*?){%d})\\bfish\\b", n-1)
} 

and replace the nth occurrence of 'fish' except the first one which can be easily done using sub or the default option in str_replace

sub(patfn(2), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two dog red fish blue fish"
sub(patfn(3), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two fish red dog blue fish"
sub(patfn(4), "\\1dog", mystring, perl=TRUE)
#[1] "one fish two fish red fish blue dog"

This should also work with str_replace

 str_replace(mystring, patfn(2), "\\1dog")
 #[1] "one fish two dog red fish blue fish"
 str_replace(mystring, patfn(3), "\\1dog")
 #[1] "one fish two fish red dog blue fish"

Based on the pattern/replacement mentioned above, we can create a new function to do most of the options

replacerFn <- function(String, word, rword, n){
 stopifnot(n >0)
  pat <- sprintf(paste0("^((.*?\\b", word, "\\b.*?){%d})\\b",
           word,"\\b"), n-1)
  rpat <- paste0("\\1", rword)
  if(n >1) { 
    stringr::str_replace(String, pat, rpat)
   } else {
    stringr::str_replace(String, word, rword)
    }
 }


 replacerFn(mystring, "fish", "dog", 1)
 #[1] "one dog two fish red fish blue fish"
 replacerFn(mystring, "fish", "dog", 2)
 #[1] "one fish two dog red fish blue fish"
 replacerFn(mystring, "fish", "dog", 3)
 #[1] "one fish two fish red dog blue fish"
 replacerFn(mystring, "fish", "dog", 4)
 #[1] "one fish two fish red fish blue dog"



回答2:


A useful answer depends a lot on the string and what you know about it. With regex, one option is to build a regex that matches the whole line, but in different pieces, so you can put the pieces you like back in:

str_replace(mystring, '(^.*?fish.*?)(fish)(.*?fish.*)', '\\1dog\\3')
# [1] "one fish two dog red fish blue fish"

where the \\1 and \\3 in the replacement match the first and third parentheses captured, respectively. Note the lazy (ungreedy) quantifiers *?, which are important so you don't overmatch.

You can do the same thing to match the third or fourth occurrence, of course:

str_replace(mystring, '(^.*?fish.*?fish.*?)(fish)(.*)', '\\1dog\\3')
# [1] "one fish two fish red dog blue fish"
str_replace(mystring, '(^.*?fish.*?fish.*?fish.*?)(fish)(.*?)', '\\1dog\\3')
# [1] "one fish two fish red fish blue dog"

This is not tremendously efficient, though. You can use quantifiers to repeat, but they make numbering the replacement groups a little confusing:

str_replace(mystring, '^((.*?fish.*?){3})(fish)(.*?)', '\\1dog\\4')
# [1] "one fish two fish red fish blue dog"

but if you make the repeated group non-capturing (?: ... ), it makes more sense:

str_replace(mystring, '^((?:.*?fish.*?){3})(fish)(.*?)', '\\1dog\\3')
# [1] "one fish two fish red fish blue dog"

All of this is a lot of regex, though. A simpler option (depending on the context and how much you like regex, I suppose) may be to use strsplit and then recombine, collapseing separately:

mystrlist <- strsplit(mystring, 'fish ')[[1]] # match the space so not the last "fish$"
paste0(c(mystrlist[1], 
         paste0(mystrlist[2:3], collapse = 'dog '), 
         mystrlist[4]), 
       collapse = 'fish ')
# [1] "one fish two dog red fish blue fish"

paste0(c(mystrlist[1:2], 
         paste0(mystrlist[3:4], collapse = 'dog ')), 
       collapse = 'fish ')
# [1] "one fish two fish red dog blue fish"

This doesn't work terribly well for the last word, of course, but the end-of-line regex token $ makes using str_replace (or just sub) really easy for that purpose:

sub('fish$', 'dog', mystring)
# [1] "one fish two fish red fish blue dog"

Bottom line: It depends a lot on the context what the best choice is, but there is not an extra parameter for which match to replace, sadly.




回答3:


stringr is designed to work on character vectors. It does not have functions which allow to play within a vector element with any great level of detail. But an easy approach is to split the string into a character vector of subsets, apply stringr functions on this vector (since this is what stringr is really good at), then join the vector back into a single string. These steps, of course, can be turned into a function.

This method can be applied whenever something needs to be done within an individual string.

For the example provided here, the suitable subsets are individual words.

So, to replace the nth element of a string:

library(stringr)

replace_function <- function(string, word, rword, n) {
  vec <- unlist(strsplit(string, " "))
  vec[str_which(vec, word)[n]] <- rword
  str_c(vec, collapse = " ")
}

replace_function(mystring, "fish", "dog", 1)
[1] "one dog two fish red fish blue fish"

replace_function(mystring, "fish", "dog", 2)
[1] "one fish two dog red fish blue fish"

To replace the nth from last element is easy by adding rev():

replace_end_function <- function(string, word, rword, n) {
  vec <- unlist(strsplit(string, " "))
  vec[rev(str_which(vec, word))[n]] <- rword
  str_c(vec, collapse = " ")
}

replace_end_function(mystring, "fish", "dog", 1)
[1] "one fish two fish red fish blue dog"

replace_end_function(mystring, "fish", "dog", 2)
[1] "one fish two fish red dog blue fish"

And to replace the nth element to the last element:

replace_end_function <- function(string, word, rword, n) {
  vec <- unlist(strsplit(string, " "))
  vec[str_which(vec, word)[n:length(str_which(vec, word))]] <- rword
  str_c(vec, collapse = " ")
}

replace_end_function(mystring, "fish", "dog", 1)
[1] "one dog two dog red dog blue dog"

replace_end_function(mystring, "fish", "dog", 2)
[1] "one fish two dog red dog blue dog"

replace_end_function(mystring, "fish", "dog", 3)
[1] "one fish two fish red dog blue dog"

replace_end_function(mystring, "fish", "dog", 4)
[1] "one fish two fish red fish blue dog"

Note that this answer does not use str_replace(), as the OP had asked, because, as the OP noted, str_replace() only works on the 1st element of a vector and str_replace_all() works on all of them. So they are not the most appropriate functions within the stringr package to answer this question: indexing with the result of str_which() is much more suitable (once the individual string has been split into a vector of strings of course).



来源:https://stackoverflow.com/questions/36368712/how-to-use-stringrs-replace-all-function-to-replace-specific-matches-in-a-str

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!