Convert *some* column classes in data.table

后端 未结 2 573
梦毁少年i
梦毁少年i 2020-11-29 06:29

I want to convert a subset of data.table cols to a new class. There\'s a popular question here (Convert column classes in data.table) but the answer creates a new object, ra

相关标签:
2条回答
  • 2020-11-29 07:01

    You can use .SDcols:

    dat[, cols] <- dat[, lapply(.SD, factor), .SDcols=cols]

    0 讨论(0)
  • 2020-11-29 07:11

    Besides using the option as suggested by Matt Dowle, another way of changing the column classes is as follows:

    dat[, (cols) := lapply(.SD, factor), .SDcols = cols]
    

    By using the := operator you update the datatable by reference. A check whether this worked:

    > sapply(dat,class)
           ID   Quarter     value 
     "factor"  "factor" "numeric" 
    

    As suggeted by @MattDowle in the comments, you can also use a combination of for(...) set(...) as follows:

    for (col in cols) set(dat, j = col, value = factor(dat[[col]]))
    

    which will give the same result. A third alternative is:

    for (col in cols) dat[, (col) := factor(dat[[col]])]
    

    On a smaller datasets, the for(...) set(...) option is about three times faster than the lapply option (but that doesn't really matter, because it is a small dataset). On larger datasets (e.g. 2 million rows), each of these approaches takes about the same amount of time. For testing on a larger dataset, I used:

    dat <- data.table(ID=c(rep("A", 1e6), rep("B",1e6)),
                      Quarter=c(1:1e6, 1:1e6),
                      value=rnorm(10))
    

    Sometimes, you will have to do it a bit differently (for example when numeric values are stored as a factor). Then you have to use something like this:

    dat[, (cols) := lapply(.SD, function(x) as.integer(as.character(x))), .SDcols = cols]
    


    WARNING: The following explanation is not the data.table-way of doing things. The datatable is not updated by reference because a copy is made and stored in memory (as pointed out by @Frank), which increases memory usage. It is more an addition in order to explain the working of with = FALSE.

    When you want to change the column classes the same way as you would do with a dataframe, you have to add with = FALSE as follows:

    dat[, cols] <- lapply(dat[, cols, with = FALSE], factor)
    

    A check whether this worked:

    > sapply(dat,class)
           ID   Quarter     value 
     "factor"  "factor" "numeric" 
    

    If you don't add with = FALSE, datatable will evaluate dat[, cols] as a vector. Check the difference in output between dat[, cols] and dat[, cols, with = FALSE]:

    > dat[, cols]
    [1] "ID"      "Quarter"
    
    > dat[, cols, with = FALSE]
        ID Quarter
     1:  A       1
     2:  A       2
     3:  A       3
     4:  A       4
     5:  A       5
     6:  B       1
     7:  B       2
     8:  B       3
     9:  B       4
    10:  B       5
    
    0 讨论(0)
提交回复
热议问题