divide a range of values in bins of equal length: cut vs cut2

匆匆过客 提交于 2019-11-30 08:39:29

It's not too hard to make the breaks and labels yourself, with something like this. Here since the midpoint is a single number, I don't actually return a factor with labels but instead a numeric vector.

cut2 <- function(x, breaks) {
  r <- range(x)
  b <- seq(r[1], r[2], length=2*breaks+1)
  brk <- b[0:breaks*2+1]
  mid <- b[1:breaks*2]
  brk[1] <- brk[1]-0.01
  k <- cut(x, breaks=brk, labels=FALSE)
  mid[k]
}

There's probably a better way to get the bin breaks and midpoints; I didn't think about it very hard.

Note that this answer is different than Joshua's; his gives the median of the data in each bins while this gives the center of each bin.

> head(cut2(x,3))
[1] 16.666667  3.333333 16.666667  3.333333 16.666667 16.666667
> head(ave(x, cut(x,3), FUN=median))
[1] 18  2 18  2 18 18

Use ave like so:

set.seed(21)
x <- sample(0:20, 100, replace=TRUE)
xCenter <- ave(x, cut(x,3), FUN=median)

We can use smart_cut from package cutr:

devtools::install_github("moodymudskipper/cutr")
library(cutr)

Using @Joshua's sample data:

median by interval (same output as @Joshua except it's an ordered factor) :

smart_cut(x,3, "n_intervals", labels= ~ median(.))
# [1] 18 2  18 2  18 18 ...
# Levels: 2 < 11 < 18

center of each interval (same output as @Aaron except it's an ordered factor) :

smart_cut(x,3, "n_intervals", labels= ~ mean(.y))
# [1] 16.67 3.333 16.67 3.333 16.67 16.67 ...
# Levels: 3.333 < 10 < 16.67

mean of values by interval :

smart_cut(x,3, "n_intervals", labels= ~ mean(.))
# [1] 17.48 2.571 17.48 2.571 17.48 17.48 ...
# Levels: 2.571 < 11.06 < 17.48

labels can be a character vector just like in base::cut.default, but it can also be, as it is here, a function of 2 parameters, the first being the values contained in the bin, and the second the cut points of the bin.

more on cutr and smart_cut

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!