R (data.table) group data by custom range (for example, -18, 18-25, …, 65+)

前端 未结 2 1495
太阳男子
太阳男子 2021-01-12 03:08

I can\'t find a solution in R (using data.table) to group data by a custom range (for example, -18, 18-25, ..., 65+) not by a single value.

What I\'m using right now

相关标签:
2条回答
  • 2021-01-12 03:36

    @jdharrison is right: cut(...) is the way to go.

    library(data.table)
    # create sample - you have this already
    set.seed(1)   # for reproducibility
    DT <- data.table(age=sample(15:70,1000,replace=TRUE),
                     value=rpois(1000,10))
    
    # you start here...
    breaks <- c(0,18,25,35,45,65,Inf)
    DT[,list(mean=mean(value)),by=list(age=cut(age,breaks=breaks))][order(age)]
    #         age      mean
    # 1:   (0,18] 10.000000
    # 2:  (18,25]  9.579365
    # 3:  (25,35] 10.158192
    # 4:  (35,45]  9.775510
    # 5:  (45,65]  9.969697
    # 6: (65,Inf] 10.141414
    
    0 讨论(0)
  • 2021-01-12 04:02

    Example with numeric variable which makes following:

              test         BucketName
    1    615.59148      01. 0 - 5,000
    2   1135.42357      01. 0 - 5,000
    3   5302.24208 02. 5,000 - 10,000
    4   3794.23109      01. 0 - 5,000
    5   2773.70667      01. 0 - 5,000
    ...
    

    And code for that

    generateLabelsForPivot = function(breaksVector)
    {
    
        startValue = min(breaksVector)
        lastValue  = max(breaksVector)
    
        lengthOfBreaks = length(breaksVector)
        orders         = seq(1, lengthOfBreaks-1, 1)
        startingPoints = c(breaksVector[-length(breaksVector)])
        finishPoints   = c(breaksVector[-1])
    
        addingZeros = function(X)
        {
            prefix = ""
    
            if(nchar(X) == 1)
            {
                prefix = "0"
            } else {
                prefix = ""
            }
    
            return(paste(prefix, X, ". ", sep = ""))
        }
    
        orderPrefixes = sapply(orders, addingZeros)
        startingPoints.pretty = prettyNum(startingPoints, scientific=FALSE, big.mark=",", preserve.width = "none")
        finishPoints.pretty   = prettyNum(finishPoints, scientific=FALSE, big.mark=",", preserve.width = "none")
        labels = paste(orderPrefixes, startingPoints.pretty, " - ", finishPoints.pretty, sep = "")
        return(labels)
    }
    
    
    dataFrame = data.frame(test = runif(100, 0, 100*100))
    
    GCV_breaks = c(0, 5000, 10000, 20000, 30000, 1000000) 
    GCV_labels = generateLabelsForPivot(GCV_breaks) 
    GCV_labels 
    GCV_buckets = cut(dataFrame$test, breaks = GCV_breaks, labels = GCV_labels) 
    
    dataFrame$BucketName = GCV_buckets
    
    0 讨论(0)
提交回复
热议问题