Assign value to specific data.table columns and rows

前端 未结 1 449
死守一世寂寞
死守一世寂寞 2021-02-01 06:29

still understanding this great package... Could anyone please explain me the reason of this error? Thanks!

library(data.table)

DT <- data.table(id   = LETTER         


        
相关标签:
1条回答
  • 2021-02-01 06:44

    First, it is recommended to use := instead of [<- for efficiency. The [<- is mostly provided for backward consistency. So, I'll first illustrate how to efficiently use := to get what you're after. := is assignment by reference (and it updates a data.table without copying the data, therefore extremely fast).

    require(data.table)
    DT <- data.table(x = 1:5, y = 6:10, z = 11:15)
    

    Suppose you want to change the 2nd row of "y" to that of 5th row of "y":

    DT[2, y := DT[5, y]] 
    

    or equivalently

    DT[2, `:=`(y = DT[5, y])]
    

    Suppose you want to change the 2nd row of both "y" and "z" to that of the corresponding entries in row 5, then:

    DT[2, c("y", "z") := as.list(DT[5, c(y, z)])]
    

    or equivalently

    DT[2, `:=`(y = DT[5, y], z = DT[5, z])]
    

    Now just to show you how to assign using [<- (while it is clearly not recommended), it can be done as follows:

    DT <- data.table(x = 1:5, y = 6:10, z = 11:15)
    DT[1, c("y", "z")] <- as.list(DT[5, c(y, z)])
    

    or equivalently, you can also pass the column number:

    DT[1, 2:3] <- as.list(DT[5, c(y, z)])
    

    Hope this helps.


    Edit 1

    As to why you get the error:

    First, the RHS has to be a list for [<-data.table if it has more than 1 columns to be assigned to.

    Second, j argument on the left of <- is not evaluated within the environment of your data.table. So, it needs to know what the values for j are. And since you provide var1 and var2 (without the double quotes that would make them a character vector), it is understood to be a variable. And so, it checks for variables var1 and var2, but since it doesn't "see" the columns within your data.table as variables (like it normally does when you do assignments etc on the RHS of <-), it'll look for the same variables in its parent environment which is the global environment where it doesn't find them and so you get the error. For ex: do this:

    y <- "y"
    z <- "z"
    # And now try your second case: 
    DT[2, c(y, z)] <- as.list(DT[5, c(y, z)])
    # the left side takes values from the assignments you made above
    # the right side y and z are evaluated within the environment of your data.table
    # and so it sees the columns y and z as variables and their values are picked accordingly
    

    Third, the [<-data.table function accepts only atomic (vector) types for j argument. So, your first assignment DT[2, list(var1, var2)] <- DT[8, list(var1, var2)] will still give an error if you do it the right way, that is:

    y <- "y"
    z <- "z"
    DT[2, list(y, z)] <- as.list(DT[5, c(y, z)])
    
    # Error in `[<-.data.table`(`*tmp*`, 2, list(y, z), value = list(10L, 15L)) : 
    #   j must be atomic vector, see ?is.atomic
    

    hope this helps.


    Edit 2

    Just to illustrate that a copy of your data.table is being made when you do [<- but not when :=,

    DT <- data.table(x = 1:5, y = 6:10, z = 11:15)
    tracemem(DT)
    # [1] "<0x7fbefb89b580>"
    
    DT[1, c("y", "z") := list(100L, 110L)]
    tracemem(DT)
    # [1] "<0x7fbefb89b580>"
    
    DT[2, c("y", "z")] <- list(200L, 201L)
    # tracemem[0x7fbefacc4fa0 -> 0x7fbefd297838]: # copied, inefficient
    
    0 讨论(0)
提交回复
热议问题