I have an R dataframe that has two columns of strings. In one of the columns (say, Column1) there are duplicate values. I need to relabel that column so that it would have t
Let's say your data (ordered by Column1
) is within an object called tab
. First create a run length object
c1.rle <- rle(tab$Column1)
c1.rle
##lengths: int [1:4] 2 2 1 1
##values : int [1:4] 1 2 3 4
That gives you values of Column1
and the according number of appearences of each element. Then use that information to create the new column with unique identifiers:
tab$Column1.new <- paste0(rep(c1.rle$values, times = c1.rle$lengths), "_",
unlist(lapply(c1.rle$lengths, seq_len)))
Not sure, if this is appropriate in your situation, but you could also just paste together Column1
and Column2
, to create an unique identifier...