Using ifelse to transform column in R

后端 未结 4 1460
-上瘾入骨i
-上瘾入骨i 2021-01-26 18:40

I have a dataframe with a column of numbers.

In a separate column, I want to print whether the number is \"less than 10\", \"between 10 and 20\" or \"between 20 and 30\

相关标签:
4条回答
  • 2021-01-26 19:16

    You could use cut from base R, but be aware it makes the words variable a factor. You just need to set the appropriate intervals (which is why I used 30.5 etc for readibility). BTW, in your example you coded 20 should be recoded both to "between 10 and 20" and to "between 20 and 30", which won't work.

    data$words <- cut(data$number, c(0,9.5,20.5,30.5,40), c("less than 10", "between 10 and 20", "between 20 and 30", "other"))
    data
    
    0 讨论(0)
  • 2021-01-26 19:24

    The main problem was that you need to reference the variable in each inequality test. To make this more readable, I wrapped everything in a with(data... call. Another problem with your code was the use of && instead of &. The former is for single values only while the latter compares each element of two vectors.

    data$words<-
      with(data,
           ifelse(number >= 0 & number <= 9, "less than 10",
           ifelse(number >= 10 & number <= 20, "between 10 and 20",
           ifelse(number >= 20 & number <= 30, "between 20 and 30", "other"))))
    

    I also think this is a lot more readable than the tidyverse without introducing new syntax. It is easier to debug, too.

    0 讨论(0)
  • 2021-01-26 19:28
    library(tidyverse)
     data<-data.frame(number=(1:40))
     data %>% 
       mutate(word = case_when(
         number>=0 & number<10~"less than 10",
         number>=10 & number<20~"between 10 and 20",
         number>=20 & number<30~"between 20 and 30",
         T~"Other"
       ))
       number              word
    1       1      less than 10
    2       2      less than 10
    3       3      less than 10
    4       4      less than 10
    5       5      less than 10
    6       6      less than 10
    7       7      less than 10
    8       8      less than 10
    9       9      less than 10
    10     10 between 10 and 20
    11     11 between 10 and 20
    12     12 between 10 and 20
    13     13 between 10 and 20
    14     14 between 10 and 20
    15     15 between 10 and 20
    16     16 between 10 and 20
    17     17 between 10 and 20
    18     18 between 10 and 20
    19     19 between 10 and 20
    20     20 between 20 and 30
    21     21 between 20 and 30
    22     22 between 20 and 30
    23     23 between 20 and 30
    24     24 between 20 and 30
    25     25 between 20 and 30
    26     26 between 20 and 30
    27     27 between 20 and 30
    28     28 between 20 and 30
    29     29 between 20 and 30
    30     30             Other
    31     31             Other
    32     32             Other
    33     33             Other
    34     34             Other
    35     35             Other
    36     36             Other
    37     37             Other
    38     38             Other
    39     39             Other
    40     40             Other
    
    0 讨论(0)
  • 2021-01-26 19:32

    do you need it to be all in one statement?

    There are a few syntactical mistakes in your code, but a possible solution would be to do something like this

    data$text <- "other"
    data$text[data$number >=0 & data$number < 10] <- "less than 10"
    data$text[data$number >=10 & data$number < 20] <- "between 10 and 20"
    data$text[data$number >=20 & data$number < 30] <- "between 20 and 30"
    

    I created a new column because if I were to replace the values in the 'number' column with text, the entire column would be coerced to character type and it might cause unexpected behaviour with the inequality operators.

    You also have some overlap in your categories. Consider changing your upper bound to strictly less than (for example 20 is both >=20 and <=20, so falls into the "between 10 and 20" and "between 20 and 30" categories

    If you want a one-liner, you can use the cut() function:

    cut(data$number, breaks=c(0,10,20,30,Inf), 
    labels=c("less than 10", "between 10 and 20", "between 20 and 30", "other"))
    

    this turns a numeric vector into factor.

    0 讨论(0)
提交回复
热议问题