Transforming Dataset into value matrix

后端未结

关注

 2  2036

Sorry about the hopeless title..

I have a dataset that looks like:

|userId|movieId|rating|genre1|genre2|
|1     |13     |3.5   |1     |0     |
|1


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  孤独总比滥情好        
                
              
                            
                2020-12-12 06:08
              
            
            
                                                                       
Try

library(dplyr)
library(tidyr)

df %>%
  select(-(genre1:genre2)) %>%
  spread(userId, rating, fill = "")


Which gives:

#  movieId   1 2   3   4
#1       4     3        
#2      13 3.5       4.5
#3     412 2.5   2.5   5




Data

df <- structure(list(userId = c(1L, 1L, 2L, 3L, 4L, 4L), movieId = c(13L, 
412L, 4L, 412L, 13L, 412L), rating = c(3.5, 2.5, 3, 2.5, 4.5, 
5), genre1 = c(1L, 1L, 0L, 1L, 1L, 1L), genre2 = c(0L, 1L, 1L, 
1L, 0L, 1L)), .Names = c("userId", "movieId", "rating", "genre1", 
"genre2"), class = "data.frame", row.names = c(NA, -6L))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野性不改        
                
              
                            
                2020-12-12 06:21
              
            
            
                                                                       
If you have several users and several movies, you could easily run out of memory in building a matrix. For instance say that users are 1000 and the different movies are 1000. You'll end up with a matrix containing 1M entries, most of them will be missing (since not every users saw every movie).

If your dataset is big, a sparseMatrix from the Matrix package is the way to go. If both users and movies id's are sequential (i.e. they start with 1 and finish with the number of different entries), building it is straightforward. Using @StevenBeaupré data:

require(Matrix)
mat<-sparseMatrix(df$userId,df$movieId,x=df$rating)


If the id's are not sequential:

mat<-sparseMatrix(as.integer(factor(df$userId)), 
                  as.integer(factor(df$movieId)),x=df$rating)


You can basically perform any matrix operation on a sparseMatrix too.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复