Map numerics to categorical values in R, based on different ranges for the numerics [duplicate]

自作多情 提交于 2019-12-24 02:38:19

问题


Hope my title makes sense. I have a dataframe with a column of numeric values, and I would like to use this column to create a new column whereby the numeric values are 'mapped' to different buckets based on their values. Below is some test data, as well as a rough-around-the-edges nested ifelse() approach that I am currently using to solve this problem. I am hoping to code this in a better way that doesn't involve nested ifelse() statements, since this approach doesn't scale well for many buckets:

mydf = data.frame(strings = letters[1:10], 
              numerics = c(0.2, 0.4, 1.3, 5.2, 3.3, 2.1, 7.3, 1.1, 4.3, 8.3),
              stringsAsFactors = FALSE)

Here is my test dataframe, and here is my nested ifelse() approach to solving my problem:

mydf$buckets = ifelse(mydf$numerics <= 2, 0, 
                   ifelse(mydf$numerics <= 4, 1, 
                       ifelse(mydf$numerics <= 5, 2, 
                            ifelse(mydf$numerics <= 7, 3, 4))))

What the above code does is maps values in the numeric column as follows:

  • all values <2 go to 0
  • all values <4 go to 1
  • all values <5 go to 2
  • all values <7 go to 3
  • all values >= 7 to go 4

this approach doesn't scale well for more than a small number of buckets. any help with this is appreciated! Thanks,


回答1:


I really like using case_when in this sort of situation as already mentioned by @tictocchoc in the comments:

suppressPackageStartupMessages(library(tidyverse))

mydf = data.frame(strings = letters[1:10], 
                  numerics = c(0.2, 0.4, 1.3, 5.2, 3.3, 2.1, 7.3, 1.1, 4.3, 8.3),
                  stringsAsFactors = FALSE)

mydf %>%
  mutate(buckets = case_when(
    numerics < 2 ~0,
    numerics < 4 ~1,
    numerics < 5 ~2,    
    numerics < 7 ~3,
    numerics >= 7 ~4
  ))
#>    strings numerics buckets
#> 1        a      0.2       0
#> 2        b      0.4       0
#> 3        c      1.3       0
#> 4        d      5.2       3
#> 5        e      3.3       1
#> 6        f      2.1       1
#> 7        g      7.3       4
#> 8        h      1.1       0
#> 9        i      4.3       2
#> 10       j      8.3       4



回答2:


try using the findInterval function in base R:

 findInterval(mydf$numerics,c(2,4,5,7))
   [1] 0 0 0 3 1 1 4 0 2 4


来源:https://stackoverflow.com/questions/46046236/map-numerics-to-categorical-values-in-r-based-on-different-ranges-for-the-numer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!