large-scale regression in R with a sparse feature matrix

前端 未结 4 778
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-08 05:37

I\'d like to do large-scale regression (linear/logistic) in R with many (e.g. 100k) features, where each example is relatively sparse in the feature space---e.g., ~1k non-ze

4条回答
  •  有刺的猬
    2020-12-08 06:25

    Don't know about SparseM but the MatrixModels package has an unexported lm.fit.sparse function that you can use. See ?MatrixModels:::lm.fit.sparse. Here is an example:

    Create the data:

    y <- rnorm(30)
    x <- factor(sample(letters, 30, replace=TRUE))
    X <- as(x, "sparseMatrix")
    class(X)
    # [1] "dgCMatrix"
    # attr(,"package")
    # [1] "Matrix"
    dim(X)
    # [1] 18 30
    

    Run the regression:

    MatrixModels:::lm.fit.sparse(t(X), y)
    #  [1] -0.17499968 -0.89293312 -0.43585172  0.17233007 -0.11899582  0.56610302
    #  [7]  1.19654666 -1.66783581 -0.28511569 -0.11859264 -0.04037503  0.04826549
    # [13] -0.06039113 -0.46127034 -1.22106064 -0.48729092 -0.28524498  1.81681527
    

    For comparison:

    lm(y~x-1)
    
    # Call:
    # lm(formula = y ~ x - 1)
    # 
    # Coefficients:
    #       xa        xb        xd        xe        xf        xg        xh        xj  
    # -0.17500  -0.89293  -0.43585   0.17233  -0.11900   0.56610   1.19655  -1.66784  
    #       xm        xq        xr        xt        xu        xv        xw        xx  
    # -0.28512  -0.11859  -0.04038   0.04827  -0.06039  -0.46127  -1.22106  -0.48729  
    #       xy        xz  
    # -0.28524   1.81682  
    

提交回复
热议问题