Renaming duplicate strings in R

前端 未结 4 1665
情歌与酒
情歌与酒 2021-01-14 17:31

I have an R dataframe that has two columns of strings. In one of the columns (say, Column1) there are duplicate values. I need to relabel that column so that it would have t

4条回答
  •  失恋的感觉
    2021-01-14 18:21

    May be a little more of a workaround, but parts of this may be more useful and simpler for someone with not quite the same needs. make.names with the unique=T attribute adds a dot and numbers names that are repeated:

    x <- make.names(tab$Column1,unique=T)
    > print(x)
    [1] "X1"   "X1.1" "X2"   "X2.1" "X3"   "X4"   
    

    This might be enough for some folks. Here you can then grab the first entries of elements that are repeated, but not elements that are not repeated, then add a .0 to the end.

    y <- rle(tab$Column1)
    tmp <- !duplicated(tab$Column1) & (tab$Column1 %in% y$values[y$lengths>1])
    x[tmp] <- str_replace(x[tmp],"$","\\.0")
    > print(x)
    [1] "X1.0" "X1.1" "X2.0" "X2.1" "X3"   "X4"
    

    Replace the dots and remove the X

    x <- str_replace(x,"X","")
    x <- str_replace(x,"\\.","_")
    > print(x)
    [1] "1_0" "1_1" "2_0" "2_1" "3"   "4" 
    

    Might be good enough for you. But if you want the indexing to start at 1, grab the numbers, add one then put them back.

    z <- str_match(x,"_([0-9]*)$")[,2]
    z <- as.character(as.numeric(z)+1)
    x <- str_replace(x,"_([0-9]*)$",paste0("_",z))
    > print(x)
    [1] "1_1" "1_2" "2_1" "2_2" "3"   "4" 
    

    Like I said, more of a workaround here, but gives some options.

提交回复
热议问题