keep only unique elements in string in r

问题

In genomics research, you often have many strings with duplicate gene names. I would like to find an efficient way to only keep the unique gene names in a string. This is an example that works. But, isn't it possible to do this in one step, i.e., without having to split the entire string and then having to past the unique elements back together?

genes <- c("GSTP1;GSTP1;APC")
a <- unlist(strsplit(genes, ";"))
paste(unique(a), collapse=";")
[1] "GSTP1;APC"

回答1:

An alternative is doing

unique(unlist(strsplit(genes, ";")))
#[1] "GSTP1" "APC"

Then this should give you the answer

paste(unique(unlist(strsplit(genes, ";"))), collapse = ";")
#[1] "GSTP1;APC"

回答2:

Based on the example showed, perhaps

gsub("(\\w+);\\1", "\\1", genes)
#[1] "GSTP1;APC"

来源：https://stackoverflow.com/questions/38210469/keep-only-unique-elements-in-string-in-r

标签

string

unique

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!