keep only unique elements in string in r

后端 未结 2 1433
面向向阳花
面向向阳花 2021-01-29 10:01

In genomics research, you often have many strings with duplicate gene names. I would like to find an efficient way to only keep the unique gene names in a string. This is an exa

相关标签:
2条回答
  • Based on the example showed, perhaps

    gsub("(\\w+);\\1", "\\1", genes)
    #[1] "GSTP1;APC"
    
    0 讨论(0)
  • 2021-01-29 10:57

    An alternative is doing

    unique(unlist(strsplit(genes, ";")))
    #[1] "GSTP1" "APC"
    

    Then this should give you the answer

    paste(unique(unlist(strsplit(genes, ";"))), collapse = ";")
    #[1] "GSTP1;APC"
    
    0 讨论(0)
提交回复
热议问题