Question:
I\'m working in R. I want the shared columns of 2 data.tables (shared meaning same column name) to have matching classes. I\'m struggling
Based on the discussion in this question, and comments in this answer, I'm thinking I may have had it right, and just landed on an odd exception.
Note that the class doesn't change, but the technicality is that it doesn't matter (for my particular use-case that prompted the question). Below I show my "failed approach", but by following through to the merge, and the classes of the columns in the merged data.table
, we can see why the approach works: integers will just get promoted.
s2c <- function (x, type = "list")
{
as.call(lapply(c(type, x), as.symbol))
}
# In this case, I can assume all columns of A can be found in B
# I am also able to assume that the desired conversion is possible
B.class <- sapply(B[,eval(s2c(names(A)))], class)
for(col in names(A)){
set(A, j=col, value=as(A[[col]], B.class[col]))
}
# Below here is new from what I tried in question
AB <- data.table:::merge.data.table(A, B, all=T, by=c("stratum","year"))
sapply(AB, class)
stratum year bt yr
"integer" "numeric" "numeric" "numeric"
Although the problem in the question isn't solved by this answer, I figured I'd post to point out that the failure to convert "integer"
to "numeric"
might not be a problem in many situations, so this is a straightforward, albeit circumstantial, solution.
This is one very crude way to ensure common classes:
library(magrittr)
cols = intersect(names(A), names(B))
r = rbindlist(list(A = A, B = B[, ..cols]), idcol = TRUE)
r[, (cols) := lapply(.SD, . %>% as.character %>% type.convert), .SDcols=cols]
B[, (cols) := r[.id=="B", ..cols]]
A[, (cols) := r[.id=="A", ..cols]]
sapply(A, class); sapply(B, class)
# year stratum
# "integer" "integer"
# year stratum yr
# "integer" "integer" "numeric"
I don't like this solution:
"00001"
, "02995"
), and this would coerce those to actual integers, which is bad.Date
or factor
? This won't matter so much if you do this col-classes normalization as soon as you read data in, I suppose.Data:
# slightly tweaked from OP
A <- setDT(structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), stratum = c(1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L)), .Names = c("year", "stratum"), row.names =
c(NA, -45L), class = c("data.frame")))
B <- setDT(structure(list(year = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3), stratum = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L,
14L, 15L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L), yr = c(1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c("year", "stratum",
"yr"), row.names = c(NA, -45L), class = c("data.frame")))
Comment. If you have something against magrittr, use function(x) type.convert(as.character(x))
in place of the . %>%
bit.
Not very elegant but you may 'build' the as.*
call like this:
for (x in colnames(A)) { A[,x] <- eval( call( paste0("as.", class(B[,x])), A[,x]) )}