Transforming Dataset into value matrix

后端 未结 2 2036
一生所求
一生所求 2020-12-12 05:53

Sorry about the hopeless title..

I have a dataset that looks like:

|userId|movieId|rating|genre1|genre2|
|1     |13     |3.5   |1     |0     |
|1             


        
相关标签:
2条回答
  • 2020-12-12 06:08

    Try

    library(dplyr)
    library(tidyr)
    
    df %>%
      select(-(genre1:genre2)) %>%
      spread(userId, rating, fill = "")
    

    Which gives:

    #  movieId   1 2   3   4
    #1       4     3        
    #2      13 3.5       4.5
    #3     412 2.5   2.5   5
    

    Data

    df <- structure(list(userId = c(1L, 1L, 2L, 3L, 4L, 4L), movieId = c(13L, 
    412L, 4L, 412L, 13L, 412L), rating = c(3.5, 2.5, 3, 2.5, 4.5, 
    5), genre1 = c(1L, 1L, 0L, 1L, 1L, 1L), genre2 = c(0L, 1L, 1L, 
    1L, 0L, 1L)), .Names = c("userId", "movieId", "rating", "genre1", 
    "genre2"), class = "data.frame", row.names = c(NA, -6L))
    
    0 讨论(0)
  • 2020-12-12 06:21

    If you have several users and several movies, you could easily run out of memory in building a matrix. For instance say that users are 1000 and the different movies are 1000. You'll end up with a matrix containing 1M entries, most of them will be missing (since not every users saw every movie).

    If your dataset is big, a sparseMatrix from the Matrix package is the way to go. If both users and movies id's are sequential (i.e. they start with 1 and finish with the number of different entries), building it is straightforward. Using @StevenBeaupré data:

    require(Matrix)
    mat<-sparseMatrix(df$userId,df$movieId,x=df$rating)
    

    If the id's are not sequential:

    mat<-sparseMatrix(as.integer(factor(df$userId)), 
                      as.integer(factor(df$movieId)),x=df$rating)
    

    You can basically perform any matrix operation on a sparseMatrix too.

    0 讨论(0)
提交回复
热议问题