Using cut2 from Hmisc to calculate cuts for different number of groups

懵懂的女人 提交于 2019-12-10 17:22:18

问题


I was trying to calculate equal quantile cuts for a vector by using cut2 from Hmisc.

library(Hmisc)
c <- c(-4.18304,-3.18343,-2.93237,-2.82836,-2.13478,-2.01892,-1.88773,
       -1.83124,-1.74953,-1.74858,-0.63265,-0.59626,-0.5681)

cut2(c, g=3, onlycuts=TRUE)

[1] -4.18304 -2.01892 -1.74858 -0.56810

But I was expecting the following result (33%, 33%, 33%):

[1] -4.18304 -2.13478 -1.74858 -0.56810

Should I still use cut2 or try something different? How can I make it work? Thanks for your advice.


回答1:


You are seeing the cutpoints, but you want the tabular counts, and you want them as fractions of the total, so do this instead:

> prop.table(table(cut2(c, g=3) ) )

[-4.18,-2.019) [-2.02,-1.749) [-1.75,-0.568] 
     0.3846154      0.3076923      0.3076923 

(Obviously you cannot expect cut2 to create an exact split when the count of elements was not evenly divisible by 3.)




回答2:


It seems that there were accidentally thirteen values in the original data set, instead of twelve. Thirteen values cannot be equally divided into three quantile groups (as mentioned by BondedDust). Here is the original problem, except that one selected data value (-1.74953) is excluded, making it twelve values. This gives the result originally expected:

library(Hmisc)

c<-c(-4.18304,-3.18343,-2.93237,-2.82836,-2.13478,-2.01892,-1.88773,-1.83124,-1.74858,-0.63265,-0.59626,-0.5681)

cut2(c, g=3,onlycuts=TRUE)
#[1] -4.18304 -2.13478 -1.74953 -0.5681


To make it clearer to anyone not familiar with cut2 from the Hmisc package (like me as of this morning), here's a similar problem, except that we'll use the integers 1 through 12 (assigned to the vector dozen_values).

library(Hmisc)

dozen_values <-1:12

quantile_groups <- cut2(dozen_values,g=3)

levels(quantile_groups)
## [1] "[1, 5)" "[5, 9)" "[9,12]"

cutpoints <- cut2(dozen_values, g=3, onlycuts=TRUE)

cutpoints
## [1]  1  5  9 12

# Show which values belong to which quantile group, using a data frame
quantile_DF <- data.frame(dozen_values, quantile_groups)
names(quantile_DF) <- c("value", "quantile_group")

quantile_DF
##    value quantile_group
## 1      1         [1, 5)
## 2      2         [1, 5)
## 3      3         [1, 5)
## 4      4         [1, 5)
## 5      5         [5, 9)
## 6      6         [5, 9)
## 7      7         [5, 9)
## 8      8         [5, 9)
## 9      9         [9,12]
## 10    10         [9,12]
## 11    11         [9,12]
## 12    12         [9,12]

Notice that, the first quantile group includes everything up to, but not including, 5 (i.e. 1 thorough 4, in this case). The second quantile group contains 5 up to, but not including, 9 (i.e. 5 through 8, in this case). The third (last) quantile group contains 9 through 12, which includes the last value 12. Unlike the other quantile groups, the third quantile group includes the last value shown.

Anyway, you can see that the "cutpoints" 1, 5, 9, and 12 describe the start and end points of the quantile groups in the most concise way, but it is obtuse without reading relevant documentation (link to single page Inside-R site, instead of the almost 400 page PDF manual).

See this explanation about the parentheses vs square bracket notation, if it is unfamiliar to you.



来源:https://stackoverflow.com/questions/16349154/using-cut2-from-hmisc-to-calculate-cuts-for-different-number-of-groups

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!