How to do median splits within factor levels in R?

前端未结

关注

 3  1759

Here I make a new column to indicate whether myData is above or below its median

### MedianSplits based on Whole Data
#create some test data
myDataFrame=data.fra


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2021-02-14 16:34
              
            
            
                                                                       
Here is a solution using the plyr package.

myDataFrame <- data.frame(myData=runif(15),myFactor=rep(c("A","B","C"),5))
library(plyr)
ddply(myDataFrame, "myFactor", function(x){
    x$Median <- median(x$myData)
    x$FactorLevelMedianSplit <- factor(x$myData <= x$Median, levels = c(TRUE, FALSE), labels = c("Below", "Above"))
    x
})

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤独总比滥情好        
                
              
                            
                2021-02-14 16:42
              
            
            
                                                                       
Here is a hack-ish way. Hadley may come with something more elegant:

To start, we simple concatenate the by output:

 R> do.call(c,byOutput)
A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 
 1  2  2  1  1  1  1  2  1  2  1  2  1  1  2 


and what matters that we get the factor levels 1 and 2 here which we can use to re-index a new factor with those levels:

R> c("Below","Above")[do.call(c,byOutput)]
 [1] "Below" "Above" "Above" "Below" "Below" "Below" "Below" "Above" 
 [8] "Below" "Above" "Below" "Above" "Below" "Below" "Above"
R> as.factor(c("Below","Above")[do.call(c,byOutput)])
[1] Below Above Above Below Below Below Below Above Below Above 
[11] Below Above Below Below Above
Levels: Above Below


which we can then assign into the data.frame you wanted to modify:

R> myDataFrame$FactorLevelMedianSplit <- 
      as.factor(c("Below","Above")[do.call(c,byOutput)])


Update:  Never mind, we'd need to reindex myDataFrame to be sorted A A ... A B ... B C ... C as well before we add the new column.  Left as an exercise...
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2021-02-14 16:47
              
            
            
                                                                       
You weren't looking for something like this, were you?

Course$grade2 <- ifelse(Course$grade >= median(Course$grade), 1, 0)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复