问题
basically I have a single column data set of 53 values. what I am trying to achieve is binning them into sets based on a 400 point difference, ranging from ~500 to 4500. you can just be vague if needed and state a function for doing so, I can work out the rest
回答1:
A dplyr
option
library(dplyr)
df_test <- data.frame(x = runif(1000, 400, 5000),
y = rep("A", 1000))
df_test <- df_test %>%
mutate(bins = case_when(between(x, 400, 800) ~ "Set 1",
between(x, 801, 1600) ~ "Set 2",
between(x, 1601, 5000) ~ "Set 3"))
head(df_test)
x y bins
1 1687.2854 A Set 3
2 3454.1035 A Set 3
3 4979.5434 A Set 3
4 796.6475 A Set 1
5 3665.7444 A Set 3
6 3083.8969 A Set 3
You can of course adjust the between
ranges as you see fit.
回答2:
Here's a base R approach that uses cut
with breaks =
defined with a predetermined seq
.
set.seed(1)
data <- runif(n=53,500,4500)
groups <- as.integer(cut(data,c(-Inf,seq(500,4500,by=400),Inf)))
data.frame(data,groups)
data groups
1 1562.035 4
2 1988.496 5
3 2791.413 7
4 4132.831 11
回答3:
Hi i would do it like this:
data$group<-cut(data$value,
breaks = seq(0,4500,500),
labels = paste("Group",LETTERS[1:9], sep="_"))
or if you prefer more basic style of R use [ ] :
under_500<-data[data$value<500 ,]
over500_under900<-data[data$value %in% 501:900 ,]
## etc..
over4000<-data[data$value>4000 ,]
来源:https://stackoverflow.com/questions/61032294/how-group-values-into-smaller-sets-of-values-in-a-data-set