I\'m using the cut function to split my data in equal bins, it does the job but I\'m not happy with the way it returns the values. What I need is the center of the bin not t
It's not too hard to make the breaks and labels yourself, with something like this. Here since the midpoint is a single number, I don't actually return a factor with labels but instead a numeric vector.
cut2 <- function(x, breaks) {
r <- range(x)
b <- seq(r[1], r[2], length=2*breaks+1)
brk <- b[0:breaks*2+1]
mid <- b[1:breaks*2]
brk[1] <- brk[1]-0.01
k <- cut(x, breaks=brk, labels=FALSE)
mid[k]
}
There's probably a better way to get the bin breaks and midpoints; I didn't think about it very hard.
Note that this answer is different than Joshua's; his gives the median of the data in each bins while this gives the center of each bin.
> head(cut2(x,3))
[1] 16.666667 3.333333 16.666667 3.333333 16.666667 16.666667
> head(ave(x, cut(x,3), FUN=median))
[1] 18 2 18 2 18 18
Use ave
like so:
set.seed(21)
x <- sample(0:20, 100, replace=TRUE)
xCenter <- ave(x, cut(x,3), FUN=median)
We can use smart_cut
from package cutr
:
devtools::install_github("moodymudskipper/cutr")
library(cutr)
Using @Joshua's sample data:
median by interval (same output as @Joshua except it's an ordered factor) :
smart_cut(x,3, "n_intervals", labels= ~ median(.))
# [1] 18 2 18 2 18 18 ...
# Levels: 2 < 11 < 18
center of each interval (same output as @Aaron except it's an ordered factor) :
smart_cut(x,3, "n_intervals", labels= ~ mean(.y))
# [1] 16.67 3.333 16.67 3.333 16.67 16.67 ...
# Levels: 3.333 < 10 < 16.67
mean of values by interval :
smart_cut(x,3, "n_intervals", labels= ~ mean(.))
# [1] 17.48 2.571 17.48 2.571 17.48 17.48 ...
# Levels: 2.571 < 11.06 < 17.48
labels
can be a character vector just like in base::cut.default
, but it can also be, as it is here, a function of 2 parameters, the first being the values contained in the bin, and the second the cut points of the bin.
more on cutr and smart_cut