How can I combine rows within the same data frame in R (based on duplicate values under a specific column)?

后端 未结 4 1296
小鲜肉
小鲜肉 2021-01-07 03:39

Sample of 2 (made-up) example rows in df:

userid   facultyid  courseid schoolid
167       265        NA       1678  
167       71111      301      NA
         


        
相关标签:
4条回答
  • 2021-01-07 04:17

    Here's a simple one-liner from plyr. I wrote it a bit more generally than you asked:

     a <- data.frame(x=c(1,2,3,1,2,3,1,2,3),y=c(2,3,1,1,2,3,2,3,1),
           z=c(NA,1,NA,2,NA,3,4,NA,5),zz=c(1,NA,2,NA,3,NA,4,NA,5))
    
     ddply(a,~x+y,summarize,z=first(z[!is.na(z)]),zz=first(zz[!is.na(zz)]))
    

    Specifically answering the original question, if your data frame is named a, :

     ddply(a,~userid,summarize,facultyid=first(facultyid[!is.na(facultyid)]),
             courseid=first(courseid[!is.na(courseid)],
             schoolid=first(schoolid[!is.na(schoolid)])
    
    0 讨论(0)
  • 2021-01-07 04:20

    Here's a different approach using ddply :

    # requires the plyr package
    library(plyr)
    
    # Your example dataframe with added lines
    schoolex <- data.frame(userid = c(167, 167, 200, 203, 203), facultyid = c(265, 71111, 200, 300, NA), 
                            courseid = c(NA, 301, 302, 303, 303), schoolid = c(1678, NA, 1678, NA, 1678))
    
    schoolex_duprm <- ddply(schoolex, .(userid), summarize, facultyid2 = facultyid[!is.na(facultyid)][1], 
                                   courseid2 = courseid[!is.na(courseid)][1], 
                                   schoolid2 = schoolid[!is.na(schoolid)][1])
    
    0 讨论(0)
  • 2021-01-07 04:22
    aggregate(x = df1, by = list(df1$userid), FUN = function(x) na.omit(x)[1])[,-1]
    

    or use dplyr library:

    library(dplyr)
    
    df1 %>%
      group_by(userid) %>%
      summarise_each(funs(first(na.omit(.))))
    
    0 讨论(0)
  • 2021-01-07 04:33
    # initialize a vector that will contain row numbers which should be erased
    rows.to.erase <- c()
    
    # loop over the rows, starting from top
    for(i in 1:(nrow(dat)-1)) {
      if(dat$userid[i] == dat$userid[i+1]) {
        # loop over columns to recuperate data when a NA is present
        for(j in 2:4) {
          if(is.na(dat[i,j]))
            dat[i,j] <- dat[i+1,j]
        }
        rows.to.erase <- append(rows.to.erase, i+1)
      }
    }
    
    dat.clean <- dat[-rows.to.erase,]
    dat.clean
    #   userid facultyid courseid schoolid
    # 1    167       265      301     1678
    
    0 讨论(0)
提交回复
热议问题