问题
I am trying to create a sparse matrix with numerical and categorical data which will be used as an input to cv.glmnet. When only numerical data is involved, I can create a sparseMatrix using the following syntax
sparseMatrix(i=c(1,3,5,2), j=c(1,1,1,2), x=c(1,2,4,3), dims=c(5,2))
For categorical variables, the following approach seems to work:
sparse.model.matrix(~-1+automobile, data.frame(automobile=c("sedan","suv","minivan","truck","sedan")))
My VERY sparse instance has 1,000,000 observations and 10,000 variables. I do not have enough memory to first create the full matrix. The only way I can think of creating a sparseMatrix is to manually handle the categorical variables by creating the columns and converting the data in (i,j,x) format. I am hoping that somebody can suggest a better approach.
回答1:
This may or may not work, but you could try creating the model matrices for each variable separately and then cBind
ing them together.
do.call(cBind,
sapply(names(df), function(x) sparse.model.matrix(~., df[x])[, -1, drop=FALSE]))
Note that you probably want to create the intercept column and then remove it, rather than specifying -1
in the formula as you've done above. The latter will remove one level for your first factor, but keep all the levels for the others, so it depends on the ordering of the variables.
回答2:
Sparse matrices have the same capacity as dense matrices for assignment to positions using a two -column matrix as a single argument to "[":
require(Matrix)
M <- Matrix(0, 10, 10)
dfrm <- data.frame(rows=sample(1:10,5), cols=sample(1:10,5), vals=rnorm(5))
dfrm
#---------
rows cols vals
1 3 9 -0.1419332
2 4 3 1.4806194
3 6 7 -0.5653500
4 5 1 -1.0127539
5 1 2 -0.5047298
#--------
M[ with( dfrm, cbind(rows,cols) ) ] <- dfrm$vals
M
#---------------
M
10 x 10 sparse Matrix of class "dgCMatrix"
[1,] . -0.5047298 . . . . . . . .
[2,] . . . . . . . . . .
[3,] . . . . . . . . -0.1419332 .
[4,] . . 1.480619 . . . . . . .
[5,] -1.012754 . . . . . . . . .
[6,] . . . . . . -0.56535 . . .
[7,] . . . . . . . . . .
[8,] . . . . . . . . . .
[9,] . . . . . . . . . .
[10,] . . . . . . . . . .
来源:https://stackoverflow.com/questions/29479198/sparsematrix-with-numerical-and-categorical-data