I have an R dataframe that has two columns of strings. In one of the columns (say, Column1) there are duplicate values. I need to relabel that column so that it would have t
May be a little more of a workaround, but parts of this may be more useful and simpler for someone with not quite the same needs. make.names
with the unique=T
attribute adds a dot and numbers names that are repeated:
x <- make.names(tab$Column1,unique=T)
> print(x)
[1] "X1" "X1.1" "X2" "X2.1" "X3" "X4"
This might be enough for some folks. Here you can then grab the first entries of elements that are repeated, but not elements that are not repeated, then add a .0
to the end.
y <- rle(tab$Column1)
tmp <- !duplicated(tab$Column1) & (tab$Column1 %in% y$values[y$lengths>1])
x[tmp] <- str_replace(x[tmp],"$","\\.0")
> print(x)
[1] "X1.0" "X1.1" "X2.0" "X2.1" "X3" "X4"
Replace the dots and remove the X
x <- str_replace(x,"X","")
x <- str_replace(x,"\\.","_")
> print(x)
[1] "1_0" "1_1" "2_0" "2_1" "3" "4"
Might be good enough for you. But if you want the indexing to start at 1, grab the numbers, add one then put them back.
z <- str_match(x,"_([0-9]*)$")[,2]
z <- as.character(as.numeric(z)+1)
x <- str_replace(x,"_([0-9]*)$",paste0("_",z))
> print(x)
[1] "1_1" "1_2" "2_1" "2_2" "3" "4"
Like I said, more of a workaround here, but gives some options.