splitting a continuous variable into groups of equal number of elements - return numeric vector from bin values

前端 未结 2 1519
不知归路
不知归路 2021-01-27 08:36

I have a continuous variable that I want to split into bins, returning a numeric vector (of length equal to my original vector) whose values relate to the values of the bins. E

相关标签:
2条回答
  • 2021-01-27 09:06

    Maybe not much elegant, but should be efficient. Try this function:

    myCut<-function(x,breaks,retValues=c("means","highs","lows")) {
        retValues<-match.arg(retValues)
        if (length(breaks)!=1) stop("breaks must be a single number")
        breaks<-as.integer(breaks)
        if (is.na(breaks)||breaks<2) stop("breaks must greater than or equal to 2") 
        intervals<-seq(min(x),max(x),length.out=breaks+1)
        bins<-findInterval(x,intervals,all.inside=TRUE)
        if (retValues=="means") return(rowMeans(cbind(intervals[-(breaks+1)],intervals[-1]))[bins])
        if (retValues=="highs") return(intervals[-1][bins]) 
        intervals[-(breaks+1)][bins]
    }
    x = c(1,5,3,12,5,6,7)
    myCut(x,3)
    #[1]  2.833333  6.500000  2.833333 10.166667  6.500000  6.500000  6.500000
    myCut(x,3,"highs")
    #[1]  4.666667  8.333333  4.666667 12.000000  8.333333  8.333333  8.333333
    myCut(x,3,"lows")
    #[1] 1.000000 4.666667 1.000000 8.333333 4.666667 4.666667 4.666667
    
    0 讨论(0)
  • 2021-01-27 09:16

    Use ave like this:

    Given:

    x = c(1,5,3,12,5,6,7)
    

    Mean:

    ave(x,cut2(x,g = 3), FUN = mean)
    [1] 3.5 3.5 3.5 9.5 3.5 6.0 9.5
    

    Min:

    ave(x,cut2(x,g = 3), FUN = min)
    [1] 1 1 1 7 1 6 7
    

    Max:

    ave(x,cut2(x,g = 3), FUN = max)
    [1]  5  5  5 12  5  6 12
    

    Or standard deviation:

    ave(x,cut2(x,g = 3), FUN = sd)
    [1] 1.914854 1.914854 1.914854 3.535534 1.914854       NA 3.535534
    

    Note the NA result for only one data point in interval.

    Hope this is what you need.

    NOTE:
    Parameter g in cut2 is number of quantile groups. Groups might not have the same amount of data points, and the intervals might not have the same length.
    On the other hand, cut splits the interval into several of equal length.

    0 讨论(0)
提交回复
热议问题