Disambiguate non-unique elements in a character vector

后端 未结 1 1301
死守一世寂寞
死守一世寂寞 2021-01-19 03:24

Given a vector of non-unique patient initials:

init = c(\"AA\", \"AB\", \"AB\", \"AB\", \"AC\")

Looking for disambiguation as follows:

相关标签:
1条回答
  • 2021-01-19 03:57

    Use the indicated function with ave:

    uniquify <- function(x) if (length(x) == 1) x else sprintf("%s%02d", x, seq_along(x))
    ave(init, init, FUN = uniquify)
    ## [1] "AA"   "AB01" "AB02" "AB03" "AC"  
    

    If the basic requirement is just to ensure unique output then make.unique(x) or make.unique(x, sep = "0") as discussed by another answer and a comment are concise but if the requirement is that the output be exactly as in the question then they do not give the same result. If there are 10 or more duplicates the output of those answers vary even more; however, the solution here does give the same answer. Here is a further example illustrating 10 or more duplicates.

    xx <- rep(c("A", "B", "C"), c(1, 10, 2))
    ave(xx, xx, FUN = uniquify)
    ## [1] "A"   "B01" "B02" "B03" "B04" "B05" "B06" "B07" "B08" "B09" "B10" "C01" "C02"
    

    The make.unique solution could be rescued like this:

    0 讨论(0)
提交回复
热议问题