Challenge: recoding a data.frame() — make it faster

后端 未结 6 1621
借酒劲吻你
借酒劲吻你 2021-02-04 06:43

Recoding is a common practice for survey data, but the most obvious routes take more time than they should.

The fastest code that accomplishes the same task with the pr

6条回答
  •  醉梦人生
    2021-02-04 06:50

    Combining @DWin's answer, and my answer from Most efficient list to data.frame method?:

    system.time({
      dat3 <- list()
      # define attributes once outside of loop
      attrib <- list(class="factor", levels=re.codes)
      for (i in names(dat)) {              # loop over each column in 'dat'
        dat3[[i]] <- as.integer(dat[[i]])  # convert column to integer
        attributes(dat3[[i]]) <- attrib    # assign factor attributes
      }
      # convert 'dat3' into a data.frame. We can do it like this because:
      # 1) we know 'dat' and 'dat3' have the same number of rows and columns
      # 2) we want 'dat3' to have the same colnames as 'dat'
      # 3) we don't care if 'dat3' has different rownames than 'dat'
      attributes(dat3) <- list(row.names=c(NA_integer_,nrow(dat)),
        class="data.frame", names=names(dat))
    })
    identical(dat2, dat3)  # 'dat2' is from @Dwin's answer
    

提交回复
热议问题