divide a range of values in bins of equal length: cut vs cut2

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-18 13:06:49

问题


I'm using the cut function to split my data in equal bins, it does the job but I'm not happy with the way it returns the values. What I need is the center of the bin not the upper and lower ends.
I've also tried to use cut2{Hmisc}, this gives me the center of each bins, but it divides the range of data in bins that contains the same numbers of observations, rather than being of the same length.

Does anyone have a solution to this?


回答1:


It's not too hard to make the breaks and labels yourself, with something like this. Here since the midpoint is a single number, I don't actually return a factor with labels but instead a numeric vector.

cut2 <- function(x, breaks) {
  r <- range(x)
  b <- seq(r[1], r[2], length=2*breaks+1)
  brk <- b[0:breaks*2+1]
  mid <- b[1:breaks*2]
  brk[1] <- brk[1]-0.01
  k <- cut(x, breaks=brk, labels=FALSE)
  mid[k]
}

There's probably a better way to get the bin breaks and midpoints; I didn't think about it very hard.

Note that this answer is different than Joshua's; his gives the median of the data in each bins while this gives the center of each bin.

> head(cut2(x,3))
[1] 16.666667  3.333333 16.666667  3.333333 16.666667 16.666667
> head(ave(x, cut(x,3), FUN=median))
[1] 18  2 18  2 18 18



回答2:


Use ave like so:

set.seed(21)
x <- sample(0:20, 100, replace=TRUE)
xCenter <- ave(x, cut(x,3), FUN=median)



回答3:


We can use smart_cut from package cutr:

devtools::install_github("moodymudskipper/cutr")
library(cutr)

Using @Joshua's sample data:

median by interval (same output as @Joshua except it's an ordered factor) :

smart_cut(x,3, "n_intervals", labels= ~ median(.))
# [1] 18 2  18 2  18 18 ...
# Levels: 2 < 11 < 18

center of each interval (same output as @Aaron except it's an ordered factor) :

smart_cut(x,3, "n_intervals", labels= ~ mean(.y))
# [1] 16.67 3.333 16.67 3.333 16.67 16.67 ...
# Levels: 3.333 < 10 < 16.67

mean of values by interval :

smart_cut(x,3, "n_intervals", labels= ~ mean(.))
# [1] 17.48 2.571 17.48 2.571 17.48 17.48 ...
# Levels: 2.571 < 11.06 < 17.48

labels can be a character vector just like in base::cut.default, but it can also be, as it is here, a function of 2 parameters, the first being the values contained in the bin, and the second the cut points of the bin.

more on cutr and smart_cut



来源:https://stackoverflow.com/questions/5915916/divide-a-range-of-values-in-bins-of-equal-length-cut-vs-cut2

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!