There's probably a more elegant solution, but I think this works for your situation.
If you're not too fussed about mixing your workflow up with dplyr
and data.table
syntax, you can use setdiff()
to identify non-matching column names, and use data.table
syntax to create those zero-value columns efficiently without using loops or apply()
functions. Once you've made sure this works for all the possible situations, you can wrap it in a function and scale this across more datasets.
df1 <- data.frame(a = 1, b = 2, c = 3, d = 4)
df2 <- data.frame(a = 5, c = 6)
# Variables in df1 but not in df2
diff_vars <- dplyr::setdiff(names(df1),names(df2))
df2 %>%
data.table::data.table() %>%
.[,c(diff_vars):=0] %>%
tibble::as_tibble() # Can choose to keep this in data.table