Removing duplicate words in a string in R

后端未结

关注

 4  1081

栀梦 2020-12-11 03:47

Just to help someone who\'s just voluntarily removed their question, following a request for code he tried and other comments. Let\'s assume they tried something like this:

4条回答

有刺的猬 (楼主)

2020-12-11 04:19

I'm not sure if string case is a concern. This solution uses qdap with the add-on qdapRegex package to make sure that punctuation and beginning string case doesn't interfere with the removal but is maintained:

str <- c("How do I best try and try and try and find a way to to improve this code?",
    "And and here's a second one one and not a third One.")

library(qdap)
library(dplyr) # so that pipe function (%>% can work) 

str %>% 
    tolower() %>%
    word_split() %>% 
    sapply(., function(x) unbag(unique(x))) %>% 
    rm_white_endmark() %>%  
    rm_default(pattern="(^[a-z]{1})", replacement = "\\U\\1") %>%
    unname()

## [1] "How do i best try and find a way to improve this code?"
## [2] "And here's a second one not third."

0 讨论(0)

查看其它4个回答