How can I remove repeated characters in a string with R?

后端 未结 3 1027
小鲜肉
小鲜肉 2020-11-30 05:49

I would like to implement a function with R that removes repeated characters in a string. For instance, say my function is named removeRS, so it is

相关标签:
3条回答
  • 2020-11-30 06:20

    I think you should pay attention to the ambiguities in your problem description. This is a first stab, but it clearly does not work with "Good Luck" in the manner you desire:

    removeRS <- function(str) paste(rle(strsplit(str, "")[[1]])$values, collapse="")
    removeRS('Buenaaaaaaaaa Suerrrrte')
    #[1] "Buena Suerte"
    
    0 讨论(0)
  • 2020-11-30 06:30

    Since you want to replace letters that appear AT LEAST 3 times, here is my solution:

    gsub("([[:alpha:]])\\1{2,}", "\\1", "Buennaaaa Suerrrtee")
    #[1] "Buenna Suertee"
    

    As you can see the 4 "a" have been reduced to only 1 a, the 3 r have been reduced to 1 r but the 2 n and the 2 e have not been changed. As suggested above you can replace the [[:alpha:]] by any combination of [a-zA-KM-Z] or similar, and even use the "or" operator | inside the squre brackets [y|Q] if you want your code to affect only repetitions of y and Q.

    gsub("([a|e])\\1{2,}", "\\1", "Buennaaaa Suerrrtee")
    # [1] "Buenna Suerrrtee"
    # triple r are not affected and there are no triple e.
    
    0 讨论(0)
  • 2020-11-30 06:37

    I did not think very carefully on this, but this is my quick solution using references in regular expressions:

    gsub('([[:alpha:]])\\1+', '\\1', 'Buenaaaaaaaaa Suerrrrte')
    # [1] "Buena Suerte"
    

    () captures a letter first, \\1 refers to that letter, + means to match it once or more; put all these pieces together, we can match a letter two or more times.

    To include other characters besides alphanumerics, replace [[:alpha:]] with a regex matching whatever you wish to include.

    0 讨论(0)
提交回复
热议问题