Count features for different ids in columns in R in faster way

前端 未结 4 1601
时光取名叫无心
时光取名叫无心 2021-01-27 13:56

I am trying to process a 20 GB data file in R. I have 16 gigs RAM and i7 processor. I am reading the data using :

y<-read.table(file=\"sample.csv\", header =          


        
4条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-01-27 14:32

    How about table()?

    > set.seed(5)
    > ids <- sample(1:3, 12, TRUE)
    > features <- sample(1:4, 12, TRUE)
    > cbind(ids, features)
          ids features
     [1,]   1        2
     [2,]   3        3
     [3,]   3        2
     [4,]   1        1
     [5,]   1        2
     [6,]   3        4
     [7,]   2        3
     [8,]   3        4
     [9,]   3        4
    [10,]   1        3
    [11,]   1        1
    [12,]   2        1
    
    > table(ids, features)
       features
    ids 1 2 3 4
      1 2 2 1 0
      2 1 0 1 0
      3 0 1 1 3
    

    So for example feature 4 appears 3 times in id 3.

    EDIT: You can use as.data.frame() to "flatten" the table and get:

    > as.data.frame(table(ids, features))
       ids features Freq
    1    1        1    2
    2    2        1    1
    3    3        1    0
    4    1        2    2
    5    2        2    0
    6    3        2    1
    7    1        3    1
    8    2        3    1
    9    3        3    1
    10   1        4    0
    11   2        4    0
    12   3        4    3
    

提交回复
热议问题