I have a dataframe with a column of numbers.
In a separate column, I want to print whether the number is \"less than 10\", \"between 10 and 20\" or \"between 20 and 30\
You could use cut
from base R, but be aware it makes the words
variable a factor. You just need to set the appropriate intervals (which is why I used 30.5 etc for readibility). BTW, in your example you coded 20
should be recoded both to "between 10 and 20" and to "between 20 and 30", which won't work.
data$words <- cut(data$number, c(0,9.5,20.5,30.5,40), c("less than 10", "between 10 and 20", "between 20 and 30", "other"))
data
The main problem was that you need to reference the variable in each inequality test. To make this more readable, I wrapped everything in a with(data...
call. Another problem with your code was the use of &&
instead of &
. The former is for single values only while the latter compares each element of two vectors.
data$words<-
with(data,
ifelse(number >= 0 & number <= 9, "less than 10",
ifelse(number >= 10 & number <= 20, "between 10 and 20",
ifelse(number >= 20 & number <= 30, "between 20 and 30", "other"))))
I also think this is a lot more readable than the tidyverse
without introducing new syntax. It is easier to debug, too.
library(tidyverse)
data<-data.frame(number=(1:40))
data %>%
mutate(word = case_when(
number>=0 & number<10~"less than 10",
number>=10 & number<20~"between 10 and 20",
number>=20 & number<30~"between 20 and 30",
T~"Other"
))
number word
1 1 less than 10
2 2 less than 10
3 3 less than 10
4 4 less than 10
5 5 less than 10
6 6 less than 10
7 7 less than 10
8 8 less than 10
9 9 less than 10
10 10 between 10 and 20
11 11 between 10 and 20
12 12 between 10 and 20
13 13 between 10 and 20
14 14 between 10 and 20
15 15 between 10 and 20
16 16 between 10 and 20
17 17 between 10 and 20
18 18 between 10 and 20
19 19 between 10 and 20
20 20 between 20 and 30
21 21 between 20 and 30
22 22 between 20 and 30
23 23 between 20 and 30
24 24 between 20 and 30
25 25 between 20 and 30
26 26 between 20 and 30
27 27 between 20 and 30
28 28 between 20 and 30
29 29 between 20 and 30
30 30 Other
31 31 Other
32 32 Other
33 33 Other
34 34 Other
35 35 Other
36 36 Other
37 37 Other
38 38 Other
39 39 Other
40 40 Other
do you need it to be all in one statement?
There are a few syntactical mistakes in your code, but a possible solution would be to do something like this
data$text <- "other"
data$text[data$number >=0 & data$number < 10] <- "less than 10"
data$text[data$number >=10 & data$number < 20] <- "between 10 and 20"
data$text[data$number >=20 & data$number < 30] <- "between 20 and 30"
I created a new column because if I were to replace the values in the 'number' column with text, the entire column would be coerced to character type and it might cause unexpected behaviour with the inequality operators.
You also have some overlap in your categories. Consider changing your upper bound to strictly less than (for example 20 is both >=20 and <=20, so falls into the "between 10 and 20" and "between 20 and 30" categories
If you want a one-liner, you can use the cut() function:
cut(data$number, breaks=c(0,10,20,30,Inf),
labels=c("less than 10", "between 10 and 20", "between 20 and 30", "other"))
this turns a numeric vector into factor.