How to do median splits within factor levels in R?

前端 未结 3 1736
有刺的猬
有刺的猬 2021-02-14 16:05

Here I make a new column to indicate whether myData is above or below its median

### MedianSplits based on Whole Data
#create some test data
myDataFrame=data.fra         


        
相关标签:
3条回答
  • 2021-02-14 16:34

    Here is a solution using the plyr package.

    myDataFrame <- data.frame(myData=runif(15),myFactor=rep(c("A","B","C"),5))
    library(plyr)
    ddply(myDataFrame, "myFactor", function(x){
        x$Median <- median(x$myData)
        x$FactorLevelMedianSplit <- factor(x$myData <= x$Median, levels = c(TRUE, FALSE), labels = c("Below", "Above"))
        x
    })
    
    0 讨论(0)
  • 2021-02-14 16:42

    Here is a hack-ish way. Hadley may come with something more elegant:

    To start, we simple concatenate the by output:

     R> do.call(c,byOutput)
    A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 
     1  2  2  1  1  1  1  2  1  2  1  2  1  1  2 
    

    and what matters that we get the factor levels 1 and 2 here which we can use to re-index a new factor with those levels:

    R> c("Below","Above")[do.call(c,byOutput)]
     [1] "Below" "Above" "Above" "Below" "Below" "Below" "Below" "Above" 
     [8] "Below" "Above" "Below" "Above" "Below" "Below" "Above"
    R> as.factor(c("Below","Above")[do.call(c,byOutput)])
    [1] Below Above Above Below Below Below Below Above Below Above 
    [11] Below Above Below Below Above
    Levels: Above Below
    

    which we can then assign into the data.frame you wanted to modify:

    R> myDataFrame$FactorLevelMedianSplit <- 
          as.factor(c("Below","Above")[do.call(c,byOutput)])
    

    Update: Never mind, we'd need to reindex myDataFrame to be sorted A A ... A B ... B C ... C as well before we add the new column. Left as an exercise...

    0 讨论(0)
  • 2021-02-14 16:47

    You weren't looking for something like this, were you?

    Course$grade2 <- ifelse(Course$grade >= median(Course$grade), 1, 0)
    
    0 讨论(0)
提交回复
热议问题