categorizing data in R

后端未结

关注

 3  978

Im trying to categorising my data into different group based on type of data. My data and code is as follow:

bank    ROE
bank1   0.73
bank2   0.94
bank3   0.62
b


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  醉话见心        
                
              
                            
                2021-01-24 15:02
              
            
            
                                                                       
You should use the %in%-operator instead of the identity--you are comparing against a vector here.

Like so:

test$type <- ifelse(test$bank %in% sob, 1, ifelse(test$bank %in% fob, 2, ifelse(test$bank %in% jov, 3,     4)))

> test
     bank  ROE type
1   bank1 0.73    1
2   bank2 0.94    1
3   bank3 0.62    1
4   bank4 0.57    2
5   bank5 0.31    2
6   bank6 0.53    2
7   bank7 0.39    3
8   bank8 0.01    3
9   bank9 0.16    3
10 bank10 0.51    3
11 bank11 0.84    3
12 bank12 0.18    4


Alternatively, to avoid the cumbersome if-else structures you could do the classification resetting levels of a factor.

first copy the bank variable 
test$type<-test$bank

then, re-set the levels, using the vectors defined above (sob, fob, job). Notice the last step, 'other' is set to the remaining value because bank12 is not defined in the other vectors.

levels(test$type) <- list('sob' = sob,
                          'fob' = fob,
                          'jov' = jov,
                          'other' = 'bank12')


Resulting in

> test
     bank  ROE  type
1   bank1 0.73   sob
2   bank2 0.94   sob
3   bank3 0.62   sob
4   bank4 0.57   fob
5   bank5 0.31   fob
6   bank6 0.53   fob
7   bank7 0.39   jov
8   bank8 0.01   jov
9   bank9 0.16   jov
10 bank10 0.51   jov
11 bank11 0.84   jov
12 bank12 0.18 other

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉话见心        
                
              
                            
                2021-01-24 15:09
              
            
            
                                                                       
The == operator in your code compares the vector test$bank with the vectors jov. As these vectors are of different lengths (12 and 5) and the longer vector is not a multiple of the shorter one such as in the case of sob (of length 3), you get a warning message. 

To evaluate if a value is equal to any of the values in a vector you can use the %in% operator just as @ako suggest. However when working with groups factor and levels are useful functions. Specify the variable as a factor, then set new levels.

test <- data.frame(
  bank = c('bank1','bank2','bank3','bank4','bank5','bank6','bank7','bank8','bank9','bank10','bank11','bank12'),
  ROE = c(0.73,0.94,0.62,0.57,0.31,0.53,0.39,0.01,0.16,0.51,0.84,0.18)
)

test$bank <- factor(test$bank)

levels(test$bank) <- list(
  '1' = c('bank1', 'bank2','bank3'),
  '2' = c('bank4','bank5', 'bank6'),
  '3' = c('bank7', 'bank8','bank9', 'bank10','bank11'),
  'other' = NA
)

test$bank[is.na(test$bank)] <- 'other'

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长情又很酷        
                
              
                            
                2021-01-24 15:16
              
            
            
                                                                       
You could also try:

lst1 <- list(sob, fob, jov)
test$type <- setNames(rep(seq_along(lst1),sapply(lst1,length)),unlist(lst1))[test$bank]
test$type[is.na(test$type) ] <- 4

test$type
#[1] 1 1 1 2 2 2 3 3 3 3 3 4

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复