问题
I would like to modify a set of columns inside a data.table to be factors. If I knew the names of the columns in advance, I think this would be straightforward.
library(data.table)
dt1 <- data.table(a = (1:4), b = rep(c('a','b')), c = rep(c(0,1)))
dt1[,class(b)]
dt1[,b:=factor(b)]
dt1[,class(b)]
But I don't, and instead have a list of the variable names
vars.factors <- c('b','c')
I can apply the factor function to them without a problem ...
lapply(vars.factors, function(x) dt1[,class(get(x))])
lapply(vars.factors, function(x) dt1[,factor(get(x))])
lapply(vars.factors, function(x) dt1[,factor(get(x))])
But I don't know how to re-assign or update the original column in the data table.
This fails ...
lapply(vars.factors, function(x) dt1[,x:=factor(get(x))])
# Error in get(x) : invalid first argument
As does this ...
lapply(vars.factors, function(x) dt1[,get(x):=factor(get(x))])
# Error in get(x) : object 'b' not found
NB. I tried the answer proposed here without any luck.
回答1:
Yes, this is fairly straightforward:
dt1[, (vars.factors) := lapply(.SD, as.factor), .SDcols=vars.factors]
In the LHS
(of := in j
), we specify the names of the columns. If a column already exists, it'll be updated, else, a new column will be created. In the RHS, we loop over all the columns in .SD
(which stands for Subset of Data), and we specify the columns that should be in .SD
with the .SDcols
argument.
Following up on comment:
Note that we need to wrap LHS with ()
for it to be evaluated and fetch the column names within vars.factors
variable. This is because we allow the syntax
DT[, col := value]
when there's only one column to assign, by specifying the column name as a symbol (without quotes), purely for convenience. This creates a column named col
and assigns value
to it.
To distinguish these two cases apart, we need the ()
. Wrapping it with ()
is sufficient to identify that we really need to get the values within the variable.
回答2:
Using data frame:
> df1 = data.frame(dt1)
> df1[,vars.factors] = data.frame(sapply(df1[,vars.factors], factor))
> dt1 = data.table(df1)
> dt1
a b c
1: 1 1 b
2: 2 2 c
3: 3 3 b
4: 4 4 c
> str(dt1)
Classes ‘data.table’ and 'data.frame': 4 obs. of 3 variables:
$ a: int 1 2 3 4
$ b: Factor w/ 4 levels "1","2","3","4": 1 2 3 4
$ c: Factor w/ 2 levels "b","c": 1 2 1 2
- attr(*, ".internal.selfref")=<externalptr>
来源:https://stackoverflow.com/questions/26299159/can-i-programmatically-update-the-type-of-a-set-of-columns-to-factors-in-data