Using CUT and Quartile to generate breaks in R function

给你一囗甜甜゛ 提交于 2019-11-27 13:59:46

Try the following:

set.seed(700)

clientID <- round(runif(200,min=2000, max=3000),0)
orders <- round(runif(200,min=1, max=50),0)

df <- df <- data.frame(cbind(clientID,orders))

ApplyQuintiles <- function(x) {
  cut(x, breaks=c(quantile(df$orders, probs = seq(0, 1, by = 0.20))), 
      labels=c("0-20","20-40","40-60","60-80","80-100"), include.lowest=TRUE)
}
df$Quintile <- sapply(df$orders, ApplyQuintiles)
table(df$Quintile)

0-20  20-40  40-60  60-80 80-100 
  40     41     39     40     40 

I included include.lowest=TRUE in your cut function, which seems to make it work. See ?cut for more details.

There is also cut2 in the venerable Hmisc package. It does quantile cuts.

From the help:

Function like cut but left endpoints are inclusive and labels are of the form [lower, upper), except that last interval is [lower,upper]. If cuts are given, will by default make sure that cuts include entire range of x. Also, if cuts are not given, will cut x into quantile groups (g given) or groups with a given minimum number of observations (m). Whereas cut creates a category object, cut2 creates a factor object.

You can very easily accomplish this automatically with the content method in the bin function in the OneR package:

library(OneR)
set.seed(700)

clientID <- round(runif(200, min = 2000, max = 3000), 0)
orders <- round(runif(200, min = 1, max = 50), 0)
df <- data.frame(cbind(clientID, orders))

df$Quintiles <- bin(df$orders, method = "content")
table(df$Quintile)
## 
## (0.952,9.8]    (9.8,19]   (19,31.4] (31.4,38.2]   (38.2,49] 
##          40          41          39          40          40

(Full disclosure: I am the author of this package)

I use a similar function for my data and I am concerned because my quintile bins have different numbers of observation: is that OK? Thanks!

jobs02.vq <- cut(meaneduc02v, breaks=c(quantile(meaneduc02v,  probs = seq(0,        1, by=0.20), 
                          na.rm=TRUE, names=TRUE, include.lowest=TRUE, right = TRUE, 
                          labels=c("1","2","3","4","5")))) # makes quintiles

And the output I get is:

 table(jobs02.vq, useNA='ifany')
 jobs02.vq
 [1.00,2.00) [2.00,2.51) [2.51,3.34) [3.34,4.45) [4.45,5.33]        <NA> 
     82          54          69          64          67         123 

cut2 from Hmisc does de job (parameter g defines the number of quantile groups)

set.seed(700)

clientID <- round(runif(200,min=2000, max=3000),0)
orders <- round(runif(200,min=1, max=50),0)

df <- data.frame(cbind(clientID,orders))

library(Hmisc)
df$Quintile <- cut2(df$orders, g =5)
levels(df$Quintile) <-  c("0-20", "20-40", "40-60", "60-80", "80-100")

table(df$Quintile)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!