问题
I was trying to convert some continuous integers to categorical ranges, but something I did not understand happened. Although I fixed to get what I want, I still don't understand why it happened.
The variable is some integers from 0 to 12, the following code left 10
,11
,12
out from the 5+
category.
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==0]<-"0"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==1]<-"1"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==2]<-"2"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==3]<-"3"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==4]<-"4"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain>=5]<-"5+"
py2$Daily.Whole.Grain<-as.factor(py2$Daily.Whole.Grain)
But when I change the order of conversion, it includes 10
,11
,12
.
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain>=5]<-"5+"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==0]<-"0"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==1]<-"1"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==2]<-"2"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==3]<-"3"
py2$Daily.Whole.Grain[py2$Daily.Whole.Grain==4]<-"4"
Can anyone explain it, why it leaves double digits integers out? Thanks very much.
回答1:
As @CathG mentioned, the problem is due to converting the column from a numeric
class to character
. Here is perhaps a better solution using the cut function which will give you factors based on cut-points of a variable:
py2 <- data.frame(Daily.Whole.Grain = 1:10)
py2$Daily.Whole.Grain1 <- cut(py2$Daily.Whole.Grain,
breaks = c(1:5, Inf), right = FALSE, labels = c(1:4, "5+"))
py2
Daily.Whole.Grain Daily.Whole.Grain1
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5+
6 6 5+
7 7 5+
8 8 5+
9 9 5+
10 10 5+
来源:https://stackoverflow.com/questions/29235377/changing-continuous-ranges-to-categorical-in-r