问题
Hope my title makes sense. I have a dataframe with a column of numeric values, and I would like to use this column to create a new column whereby the numeric values are 'mapped' to different buckets based on their values. Below is some test data, as well as a rough-around-the-edges nested ifelse() approach that I am currently using to solve this problem. I am hoping to code this in a better way that doesn't involve nested ifelse() statements, since this approach doesn't scale well for many buckets:
mydf = data.frame(strings = letters[1:10],
numerics = c(0.2, 0.4, 1.3, 5.2, 3.3, 2.1, 7.3, 1.1, 4.3, 8.3),
stringsAsFactors = FALSE)
Here is my test dataframe, and here is my nested ifelse() approach to solving my problem:
mydf$buckets = ifelse(mydf$numerics <= 2, 0,
ifelse(mydf$numerics <= 4, 1,
ifelse(mydf$numerics <= 5, 2,
ifelse(mydf$numerics <= 7, 3, 4))))
What the above code does is maps values in the numeric column as follows:
- all values <2 go to 0
- all values <4 go to 1
- all values <5 go to 2
- all values <7 go to 3
- all values >= 7 to go 4
this approach doesn't scale well for more than a small number of buckets. any help with this is appreciated! Thanks,
回答1:
I really like using case_when
in this sort of situation as already mentioned by @tictocchoc in the comments:
suppressPackageStartupMessages(library(tidyverse))
mydf = data.frame(strings = letters[1:10],
numerics = c(0.2, 0.4, 1.3, 5.2, 3.3, 2.1, 7.3, 1.1, 4.3, 8.3),
stringsAsFactors = FALSE)
mydf %>%
mutate(buckets = case_when(
numerics < 2 ~0,
numerics < 4 ~1,
numerics < 5 ~2,
numerics < 7 ~3,
numerics >= 7 ~4
))
#> strings numerics buckets
#> 1 a 0.2 0
#> 2 b 0.4 0
#> 3 c 1.3 0
#> 4 d 5.2 3
#> 5 e 3.3 1
#> 6 f 2.1 1
#> 7 g 7.3 4
#> 8 h 1.1 0
#> 9 i 4.3 2
#> 10 j 8.3 4
回答2:
try using the findInterval
function in base R:
findInterval(mydf$numerics,c(2,4,5,7))
[1] 0 0 0 3 1 1 4 0 2 4
来源:https://stackoverflow.com/questions/46046236/map-numerics-to-categorical-values-in-r-based-on-different-ranges-for-the-numer