Renaming duplicate strings in R

前端 未结 4 1663
情歌与酒
情歌与酒 2021-01-14 17:31

I have an R dataframe that has two columns of strings. In one of the columns (say, Column1) there are duplicate values. I need to relabel that column so that it would have t

4条回答
  •  栀梦
    栀梦 (楼主)
    2021-01-14 18:17

    @Cão answer only with base R:

    x=read.table(text="
    Column1   Column2   #Column1.new
    1         A         #1_1
    1         B         #1_2
    2         C         #2_1
    2         D         #2_2
    3         E         #3
    4         F         #4", stringsAsFactors=F, header=T)
    
    string<-x$Column1
    mstring <- make.unique(as.character(string) )
    mstring<-sub("(.*)(\\.)([0-9]+)","\\1_\\3",mstring)
    y <- rle(string)
    tmp <- !duplicated(string) & (string %in% y$values[y$lengths>1])
    mstring[tmp]<-gsub("(.*)","\\1_0", mstring[tmp]) 
    end <- sub(".*_([0-9]+)","\\1",grep("_([0-9]*)$",mstring,value=T) ) 
    beg <- sub("(.*_)[0-9]+","\\1",grep("_([0-9]*)$",mstring,value=T) ) 
    newend <- as.numeric(end)+1
    mstring[grep("_([0-9]*)$",mstring)]<-paste0(beg,newend)
    x$Column1New<-mstring
    x
    

提交回复
热议问题