Renaming duplicate strings in R

前端未结

关注

 4  1665

情歌与酒 2021-01-14 17:31

I have an R dataframe that has two columns of strings. In one of the columns (say, Column1) there are duplicate values. I need to relabel that column so that it would have t

4条回答

失恋的感觉 (楼主)

2021-01-14 18:21
May be a little more of a workaround, but parts of this may be more useful and simpler for someone with not quite the same needs. make.names with the unique=T attribute adds a dot and numbers names that are repeated:
```
x <- make.names(tab$Column1,unique=T)
> print(x)
[1] "X1"   "X1.1" "X2"   "X2.1" "X3"   "X4"   
```
This might be enough for some folks. Here you can then grab the first entries of elements that are repeated, but not elements that are not repeated, then add a .0 to the end.
```
y <- rle(tab$Column1)
tmp <- !duplicated(tab$Column1) & (tab$Column1 %in% y$values[y$lengths>1])
x[tmp] <- str_replace(x[tmp],"$","\\.0")
> print(x)
[1] "X1.0" "X1.1" "X2.0" "X2.1" "X3"   "X4"
```
Replace the dots and remove the X
```
x <- str_replace(x,"X","")
x <- str_replace(x,"\\.","_")
> print(x)
[1] "1_0" "1_1" "2_0" "2_1" "3"   "4" 
```
Might be good enough for you. But if you want the indexing to start at 1, grab the numbers, add one then put them back.
```
z <- str_match(x,"_([0-9]*)$")[,2]
z <- as.character(as.numeric(z)+1)
x <- str_replace(x,"_([0-9]*)$",paste0("_",z))
> print(x)
[1] "1_1" "1_2" "2_1" "2_2" "3"   "4" 
```
Like I said, more of a workaround here, but gives some options.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...