Convert columns of arbitrary class to the class of matching columns in another data.table

前端 未结 3 1710
忘了有多久
忘了有多久 2021-01-04 02:26

Question:

I\'m working in R. I want the shared columns of 2 data.tables (shared meaning same column name) to have matching classes. I\'m struggling

相关标签:
3条回答
  • 2021-01-04 02:58

    Based on the discussion in this question, and comments in this answer, I'm thinking I may have had it right, and just landed on an odd exception.

    Note that the class doesn't change, but the technicality is that it doesn't matter (for my particular use-case that prompted the question). Below I show my "failed approach", but by following through to the merge, and the classes of the columns in the merged data.table, we can see why the approach works: integers will just get promoted.

    s2c <- function (x, type = "list") 
    {
        as.call(lapply(c(type, x), as.symbol))
    }
    
    # In this case, I can assume all columns of A can be found in B
    # I am also able to assume that the desired conversion is possible
    B.class <- sapply(B[,eval(s2c(names(A)))], class)
    for(col in names(A)){
        set(A, j=col, value=as(A[[col]], B.class[col]))
    }
    
    # Below here is new from what I tried in question
    AB <- data.table:::merge.data.table(A, B, all=T, by=c("stratum","year"))
    
    sapply(AB, class)
      stratum      year        bt        yr 
    "integer" "numeric" "numeric" "numeric" 
    

    Although the problem in the question isn't solved by this answer, I figured I'd post to point out that the failure to convert "integer" to "numeric" might not be a problem in many situations, so this is a straightforward, albeit circumstantial, solution.

    0 讨论(0)
  • 2021-01-04 03:03

    This is one very crude way to ensure common classes:

    library(magrittr)
    
    cols = intersect(names(A), names(B))
    r    = rbindlist(list(A = A, B = B[, ..cols]), idcol = TRUE)
    r[, (cols) := lapply(.SD, . %>% as.character %>% type.convert), .SDcols=cols]
    B[, (cols) := r[.id=="B", ..cols]]
    A[, (cols) := r[.id=="A", ..cols]]
    
    sapply(A, class); sapply(B, class)
    #      year   stratum 
    # "integer" "integer" 
    #      year   stratum        yr 
    # "integer" "integer" "numeric" 
    

    I don't like this solution:

    • I routinely use all-integer codes for IDs (like "00001", "02995"), and this would coerce those to actual integers, which is bad.
    • Who knows what this will do to fancy classes like Date or factor? This won't matter so much if you do this col-classes normalization as soon as you read data in, I suppose.

    Data:

    # slightly tweaked from OP
    A <- setDT(structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), stratum = c(1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
    9L, 10L, 11L, 12L, 13L, 14L, 15L)), .Names = c("year", "stratum"), row.names = 
    c(NA, -45L), class = c("data.frame")))
    
    B <- setDT(structure(list(year = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
    3, 3, 3, 3), stratum = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 
    14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 
    2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L), yr = c(1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
    3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c("year", "stratum", 
    "yr"), row.names = c(NA, -45L), class = c("data.frame")))
    

    Comment. If you have something against magrittr, use function(x) type.convert(as.character(x)) in place of the . %>% bit.

    0 讨论(0)
  • 2021-01-04 03:09

    Not very elegant but you may 'build' the as.* call like this:

    for (x in colnames(A)) { A[,x] <- eval( call( paste0("as.", class(B[,x])), A[,x]) )}
    
    0 讨论(0)
提交回复
热议问题