I am trying to process a 20 GB data file in R. I have 16 gigs RAM and i7 processor. I am reading the data using :
y<-read.table(file=\"sample.csv\", header =
How about table()?
> set.seed(5)
> ids <- sample(1:3, 12, TRUE)
> features <- sample(1:4, 12, TRUE)
> cbind(ids, features)
ids features
[1,] 1 2
[2,] 3 3
[3,] 3 2
[4,] 1 1
[5,] 1 2
[6,] 3 4
[7,] 2 3
[8,] 3 4
[9,] 3 4
[10,] 1 3
[11,] 1 1
[12,] 2 1
> table(ids, features)
features
ids 1 2 3 4
1 2 2 1 0
2 1 0 1 0
3 0 1 1 3
So for example feature 4 appears 3 times in id 3.
EDIT: You can use as.data.frame() to "flatten" the table and get:
> as.data.frame(table(ids, features))
ids features Freq
1 1 1 2
2 2 1 1
3 3 1 0
4 1 2 2
5 2 2 0
6 3 2 1
7 1 3 1
8 2 3 1
9 3 3 1
10 1 4 0
11 2 4 0
12 3 4 3