问题
I'm trying to use the cut()
function in R to group continuous variables into buckets, like this:
as.character(cut(ORIG_AMT, breaks = c(-Inf, 0, 25000, 50000, 75000, 100000, 125000, 150000, 175000, 200000, 250000, 300000, 350000, 418000, Inf)
, labels = c('Missing', '[0-25k)', '[25k-50k)', '[50k-75k)', '[75k-100k)', '[100k-125k)', '[125k-150k)','[150k-175k)', '[175k-200k)', '[200k-250k)', '[250k-300k)', '[300k-350k)', '[350k-418k)', '[418k+)'), right = FALSE, ordered = TRUE))
However, missing values are being omitted. I can't seem to find anywhere online that addresses this issue. Ideally, the missing values would all be grouped into the 'Missing'
bucket.
Ultimately, I want to take weighed averages across these buckets. If there's a better way to approach this problem than with cut()
and xtab()
I'm open to it!
来源:https://stackoverflow.com/questions/31010248/how-does-the-cut-function-address-null-missing-values