Creating dummy variables in R data.table

前端 未结 1 786
余生分开走
余生分开走 2020-11-27 16:10

I am working with an extremely large dataset in R and have been operating with data frames and have decided to switch to data.tables to help speed up with operations. I am

相关标签:
1条回答
  • 2020-11-27 16:41

    This seems to do what you're looking for:

    inds <- unique(test$index)
    test[, (inds) := lapply(inds, function(x) index == x)]
    

    which gives

          index        var1     a     b     c     d     e     f     g     h     i     j
       1:     a  0.25331851  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
       2:     b -0.02854676 FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
       3:     c -0.04287046 FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
       4:     d  1.36860228 FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
       5:     e -0.22577099 FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
      ---                                                                              
     996:     f -1.02040059 FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
     997:     g -1.31345092 FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
     998:     h -0.49448088 FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
     999:     i  1.75175715 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
    1000:     j  0.05576477 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
    

    Here's another way:

    dcast(test, index + var1 ~ index, fun = length)
    # or, if you want to preserve row order
    dcast(test[, r := .I], r + index + var1 ~ index, fun = length)[, r := NULL]
    

    And another:

    rs = split(seq(nrow(test)), test$index)
    test[, names(rs) := FALSE ]
    for (n in names(rs)) set(test, i = rs[[n]], j = n, v = TRUE )
    
    0 讨论(0)
提交回复
热议问题