问题
I would like to write a function that creates a binning variable based on some raw data. Specifically, I have a dateset with the age values for each respondent and I would like to write a function that classifies that person into an age group, where the age group is a parameter of that function.
This is what I started with:
data <- data.frame(age = 18:100)
foo <- function(data, brackets = list(18:24, 25:34, 35:59)) {
require(tidyverse)
tmp <- data %>%
drop_na(age) %>%
mutate(age_bracket = case_when(age %in% brackets[[1]] ~ paste(brackets[[1]][1], "to", brackets[[1]][length(brackets[[1]])]),
age %in% brackets[[2]] ~ paste(brackets[[2]][1], "to", brackets[[2]][length(brackets[[2]])]),
age %in% brackets[[3]] ~ paste(brackets[[3]][1], "to", brackets[[3]][length(brackets[[3]])])))
print(tmp)
}
As is obvious, the case_when part is very inflexible as I have to specify ahead of time the number of brackets. It is also quite lengthy. I would like to write some sort of loop that looks at the number of elements in the brackets argument and creates these brackets accordingly. So if I wanted to add a 60:Inf age group, the function should add another age group.
After searching online, I found that some use defused expressions (e.g. quos). I am quite unfamiliar with those, so I struggle to use them for my purpose.
回答1:
I think you are looking for the cut
function. The following makes the job:
data <- data.frame(age = 18:100)
data$age_bracket <- cut(data$age, breaks = c(0, 18, 25, 35, 60, Inf))
unique(data$age_bracket)
# [1] (0,18] (18,25] (25,35] (35,60] (60,Inf]
# Levels: (0,18] (18,25] (25,35] (35,60] (60,Inf]
You can also define labels
if you don't link brackets default labels. The advantage of using cut
rather than hand-coded solution is that you make usual operations (e.g. ordering) with the output of cut
来源:https://stackoverflow.com/questions/60894989/how-to-optimize-case-when-in-a-function