I am facing a problem I do not understand. It\'s a follow-up on answers suggested here and here
I have two identically structured datasets. One I created as a reprod
We can convert the 'pid', 'cid' columns to factor
and coerce back to numeric
or use match
with unique
values of each column to get the row/column index and this should work in creating sparseMatrix
.
test1 <- test[, lapply(.SD, function(x)
as.numeric(factor(x, levels=unique(x))))]
Or we use match
test1 <- test[, lapply(.SD, function(x) match(x, unique(x)))]
s1 <- sparseMatrix(test1$pid,test1$cid,dimnames = list(unique(test$pid),
unique(test$cid)),x = 1)
dim(s1)
#[1] 15 50
s1[1:3, 1:3]
#3 x 3 sparse Matrix of class "dgCMatrix"
# 11023 11787 14232
#204 1 1 .
#207 . . 1
#254 . . .
head(test)
# pid cid
#1: 204 11023
#2: 204 11787
#3: 207 14232
#4: 254 14470
#5: 254 14480
#6: 258 1290
EDIT:
If we want this for the full row/column index specified in 'test', we need to make the dimnames
as the same length as the max
of 'pid', 'cid'
rnm <- seq(max(test$pid))
cnm <- seq(max(test$cid))
s2 <- sparseMatrix(test$pid, test$cid, dimnames=list(rnm, cnm))
dim(s2)
#[1] 1561 30627
s2[1:3, 1:3]
#3 x 3 sparse Matrix of class "ngCMatrix"
# 1 2 3
#1 . . .
#2 . . .
#3 . . .