Nested if-else loops in R

前端 未结 5 1833
一生所求
一生所求 2021-01-19 03:28

I have a data frame named \"crimes\" which contains a \"pre_rate\" column that denotes the crime rate before a certain law is implemented. I would like to put each rate in a

相关标签:
5条回答
  • 2021-01-19 04:13

    If your data does not contain gaps, and you just want an index, you can use .bincode:

    crimes$rate_category <- .bincode(crimes$pre_rate,
                                     breaks = c(-Inf, 1, 2, 3, 4, Inf))
    

    If you want specific values for each interval, you can use a rolling join via the data.table package:

    library(magrittr)
    library(data.table)
    
    rate_category_by_pre_rate <- 
      data.table(rate_category = c("foo", "bar", "foobar", "baz", "foobie"),
                 pre_rate = c(1, 2, 3, 4, 11)) %>%
      setkey(pre_rate)
    
    crimes %>%
      as.data.table %>%
      setkey(pre_rate) %>%
      rate_category_by_pre_rate[., roll = -Inf]
    
    #>    rate_category pre_rate
    #> 1:           foo     0.27
    #> 2:           bar     1.91
    #> 3:        foobar     2.81
    #> 4:           baz     3.21
    #> 5:        foobie     4.80
    

    However, in your case, you may only need ceiling (i.e. round-up the value of pre_rate and cap it at 5:

    crimes$rate_category <- pmin(ceiling(crimes$pre_rate), 5)
    
    #>   pre_rate rate_category
    #> 1     0.27             1
    #> 2     1.91             2
    #> 3     2.81             3
    #> 4     3.21             4
    #> 5     4.80             5
    
    0 讨论(0)
  • 2021-01-19 04:14

    You may use algebraic approach to solve your problem, it should be faster than your ifelse:

    pre_rate = c(0.27, 1.91, 2.81, 3.21, 4.80) 
    crimes = data.frame(pre_rate)   
    crimes$rate = (pre_rate > 0.26 & pre_rate < 0.87)*1 + 
      (pre_rate > 1.04 & pre_rate < 1.94)* 2 + 
      (pre_rate > 2.03 & pre_rate < 2.96)* 3 + 
      (pre_rate > 3.10 & pre_rate < 3.82)* 4 + 
      (pre_rate > 4.20 & pre_rate < 11.00)* 5
    

    The idea here is to just get true or false values from expression, then it is getting multiplied by the number for which you see that as a category. The only difference would be that you won't be getting NAs here for non match instead you will get a zero, which you can off course change it. Also to add, Use "&" in cases where you want to vectorize (element by element match) your outcome as mentioned in the comments.

    Output:

    #> crimes
    # pre_rate rate
    #1     0.27    1
    #2     1.91    2
    #3     2.81    3
    #4     3.21    4
    #5     4.80    5
    
    0 讨论(0)
  • 2021-01-19 04:17

    Instead of nesting ifelse statements might I recommend using case_when. It is a bit easier to read/follow. But as @Marius mentioned your problem is the && instead of using &.

    library(tidyverse)
    crimes <- data.frame(pre_rate = c(0.27, 1.91, 2.81, 3.21, 4.80))
    
    crimes %>% 
      mutate(rate_category = case_when(pre_rate > 0.26 & pre_rate < 0.87 ~ 1,
                                       pre_rate > 1.04 & pre_rate < 1.94 ~ 2,
                                       pre_rate > 2.03 & pre_rate < 2.96 ~ 3,
                                       pre_rate > 3.10 & pre_rate < 3.82 ~ 4,
                                       pre_rate > 4.20 & pre_rate < 11.00 ~ 5))
    
    0 讨论(0)
  • 2021-01-19 04:17

    Instead of multiple nested ifelse(), a non-equi join and update on join can be used

    # OP's sample data set with one out-of-bounds value appended
    crimes = data.frame(pre_rate = c(0.27, 1.91, 2.81, 3.21, 4.80, 1.0))   
    
    library(data.table)
    # specify categories, lower, and upper bounds
    bounds <- data.table(
      cat = 1:5,
      lower = c(0.26, 1.04, 2.03, 3.10, 4.2),
      upper = c(0.87, 1.94, 2.96, 3.82, 11)
    )
    # non-equi join and update on join
    setDT(crimes)[bounds, on = .(pre_rate > lower, pre_rate < upper), rate_category := cat][]
    
       pre_rate rate_category
    1:     0.27             1
    2:     1.91             2
    3:     2.81             3
    4:     3.21             4
    5:     4.80             5
    6:     1.00            NA
    

    Note that pre-rate values which are outside of any of the given intervals do get a NA rate_category automatically.

    0 讨论(0)
  • 2021-01-19 04:35

    Why not define your lower bounds and upper bounds in two vectors then rely on indexing? Using this method, there is no need to write pre_rate > num1 & pre_rate < num2 multiple times.

    lowB <- c(0.26, 1.04, 2.03, 3.10, 4.2)
    uppB <- c(0.87, 1.94, 2.96, 3.82, 11)
    
    myCategory <- 1:5 ## this can be whatever categories you'd like
    
    crimes$rate_category <- with(crimes, myCategory[pre_rate > lowB & pre_rate < uppB])
    
    0 讨论(0)
提交回复
热议问题