In genomics research, you often have many strings with duplicate gene names. I would like to find an efficient way to only keep the unique gene names in a string. This is an exa
Based on the example showed, perhaps
gsub("(\\w+);\\1", "\\1", genes) #[1] "GSTP1;APC"
An alternative is doing
unique(unlist(strsplit(genes, ";"))) #[1] "GSTP1" "APC"
Then this should give you the answer
paste(unique(unlist(strsplit(genes, ";"))), collapse = ";") #[1] "GSTP1;APC"