Sequentially replace multiple places matching single pattern in a string with different replacements

爷,独闯天下 提交于 2021-02-07 07:10:00

问题


Using stringr package, it is easy to perform regex replacement in a vectorized manner.

Question: How can I do the following:

Replace every word in

hello,world??your,make|[]world,hello,pos

to different replacements, e.g. increasing numbers

1,2??3,4|[]5,6,7

Note that simple separators cannot be assumed, the practical use case is more complicated.


stringr::str_replace_all does not seem to work because it

str_replace_all(x, "(\\w+)", 1:7)

produces a vector for each replacement applied to all words, or it has uncertain and/or duplicate input entries so that

str_replace_all(x, c("hello" = "1", "world" = "2", ...))

will not work for the purpose.


回答1:


Here's another idea using gsubfn. The pre function is run before the substitutions and the fun function is run for each substitution:

library(gsubfn)
x <- "hello,world??your,make|[]world,hello,pos"
p <- proto(pre = function(t) t$v <- 0, # replace all matches by 0 
           fun = function(t, x) t$v <- v + 1) # increment 1 
gsubfn("\\w+", p, x)

Which gives:

[1] "1,2??3,4|[]5,6,7"

This variation would give the same answer since gsubfn maintains a count variable for use in proto functions:

pp <- proto(fun = function(...) count)
gsubfn("\\w+", pp, x)

See the gsubfn vignette for examples of using count.




回答2:


I would suggest the "ore" package for something like this. Of particular note would be ore.search and ore.subst, the latter of which can accept a function as the replacement value.

Examples:

library(ore)

x <- "hello,world??your,make|[]world,hello,pos"

## Match all and replace with the sequence in which they are found
ore.subst("(\\w+)", function(i) seq_along(i), x, all = TRUE)
# [1] "1,2??3,4|[]5,6,7"

## Create a cool ore object with details about what was extracted
ore.search("(\\w+)", x, all = TRUE)
#   match: hello world  your make   world hello pos
# context:      ,     ??    ,    |[]     ,     ,   
#  number: 1==== 2====  3=== 4===   5==== 6==== 7==



回答3:


Here a base R solution. It should also be vectorized.

x="hello,world??your,make|[]world,hello,pos"
#split x into single chars
x_split=strsplit(x,"")[[1]]
#find all char positions and replace them with "a"
x_split[gregexpr("\\w", x)[[1]]]="a"
#find all runs of "a"
rle_res=rle(x_split)
#replace run lengths by 1
rle_res$lengths[rle_res$values=="a"]=1
#replace run values by increasing number
rle_res$values[rle_res$values=="a"]=1:sum(rle_res$values=="a")
#use inverse.rle on the modified rle object and collapse string
paste0(inverse.rle(rle_res),collapse="")

#[1] "1,2??3,4|[]5,6,7"


来源:https://stackoverflow.com/questions/29996708/sequentially-replace-multiple-places-matching-single-pattern-in-a-string-with-di

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!