Create column with grouped values based on another column

后端 未结 3 1492
旧巷少年郎
旧巷少年郎 2020-11-30 08:24

I\'m sure this has been asked before, but I don\'t know what to search for, so I apologise in advance.

Let\'s say that I have the following data frame:

<         


        
相关标签:
3条回答
  • 2020-11-30 08:50

    All of the ifelses need to be within each other. Try this:

    mutate(ifelse(b >= 90, "excellent", 
           ifelse(b >= 80 & b < 90, "very_good",
           ifelse(b >= 70 & b < 80, "fair",
           ifelse(b >= 60 & b < 70, "poor", "fail")))))
    
    0 讨论(0)
  • 2020-11-30 08:54

    Define vectors with the levels and labels and then use cut on the b column:

    levels <- c(-Inf, 60, 70, 80, 90, Inf)
    labels <- c("Fail", "Poor", "fair", "very good", "excellent")
    grades %>% mutate(x = cut(b, levels, labels = labels))
        a   b         x
    1   1  66      Poor
    2   2  78      fair
    3   3  97 excellent
    4   4  46      Fail
    5   5  89 very good
    6   6  57      Fail
    7   7  80      fair
    8   8  98 excellent
    9   9 100 excellent
    10 10  93 excellent
    11 11  59      Fail
    12 12  51      Fail
    13 13  69      Poor
    14 14  75      fair
    15 15  72      fair
    16 16  48      Fail
    17 17  74      fair
    18 18  54      Fail
    19 19  62      Poor
    20 20  64      Poor
    21 21  88 very good
    22 22  70      Poor
    23 23  85 very good
    24 24  58      Fail
    25 25  95 excellent
    26 26  56      Fail
    27 27  65      Poor
    28 28  68      Poor
    29 29  91 excellent
    30 30  76      fair
    31 31  82 very good
    32 32  55      Fail
    33 33  96 excellent
    34 34  83 very good
    35 35  61      Poor
    36 36  60      Fail
    37 37  77      fair
    38 38  47      Fail
    39 39  73      fair
    40 40  71      fair
    

    Or using data.table:

    library(data.table)
    setDT(grades)[, x := cut(b, levels, labels)]
    

    Or simply in base R:

    grades$x <- cut(grades$b, levels, labels)
    

    Note

    After taking another close look at your initial approach, I noticed that you would need to include right = FALSE in the cut call, because for example, 90 points should be "excellent", not just "very good". So it is used to define where the interval should be closed (left or right) and the default is on the right, which is slightly different from OP's initial approach. So in dplyr, it would then be:

    grades %>% mutate(x = cut(b, levels, labels, right = FALSE))
    

    and accordingly in the other options.

    0 讨论(0)
  • 2020-11-30 09:03
    grades$c = grades$b # creating a new column 
    #and filling in the grades
    grades$c[grades$c >= 90] = "exellent"
    grades$c[grades$c <= 90 &  grades$c >= 80] = "very good"
    grades$c[grades$c <= 80 &  grades$c >= 70] = "fair"
    grades$c[grades$c <= 70 &  grades$c >= 60] = "poor"
    grades$c[grades$c <= 60] = "fail"
    
    0 讨论(0)
提交回复
热议问题