问题
I have a set of data in which I need to code values of certain variables (numeric) into 3 classes.
My data set is similar to this but has 60 more variables:
anim <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
wt <- c(181,179,180.5,201,201.5,245,246.4,189.3,301,354,369,205,199,394,231.3)
data <- data.frame(anim,wt)
> data
anim wt
1 1 181.0
2 2 179.0
3 3 180.5
4 4 201.0
5 5 201.5
6 6 245.0
7 7 246.4
8 8 189.3
9 9 301.0
10 10 354.0
11 11 369.0
12 12 205.0
13 13 199.0
14 14 394.0
15 15 231.3
I need to code values of the variable "wt" up into 3 classes: (wt >= 179 & wt < 200) = 1; (wt >= 200 & wt < 300) = 2; (wt > 300) = 3
which should give me this
> data2
anim wt SWT
1 1 181.0 1
2 2 179.0 1
3 3 180.5 1
4 4 201.0 2
5 5 201.5 2
6 6 245.0 2
7 7 246.4 2
8 8 189.3 1
9 9 301.0 3
10 10 354.0 3
11 11 369.0 3
12 12 205.0 2
13 13 199.0 1
14 14 394.0 3
15 15 231.3 2
回答1:
The cut
method as outlined by @Greg is probably what you want here. One thing to note is that cut
returns a factor by default, which you can suppress by supplying labels = FALSE
to return the integer values:
cut(data$wt, c(178, 200, 300, Inf), labels = FALSE)
Alternatively, if your cutting does not lend itself to natural breaks, you can use ifelse()
. You can "nest" the ifelse statements similar to Excel. I use "with" to cut down on the typing needed:
data$group2 <- with(data, ifelse(wt >= 179 & wt < 200, 1,
ifelse(wt >= 200 & wt < 300, 2, 3))
)
回答2:
You can try cut
anim <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
wt <-c(181,179,180.5,201,201.5,245,246.4,
189.3,301,354,369,205,199,394,231.3)
data <- data.frame(anim,wt)
EDIT: fixed group - right = FALSE, got rid of split example.
group = cut(data$wt, c(178, 200, 300, Inf), right=FALSE)
data$swt = as.numeric(group)
data
anim wt swt
1 1 181.0 1
2 2 179.0 1
3 3 180.5 1
4 4 201.0 2
5 5 201.5 2
6 6 245.0 2
7 7 246.4 2
8 8 189.3 1
9 9 301.0 3
10 10 354.0 3
11 11 369.0 3
12 12 205.0 2
13 13 199.0 1
14 14 394.0 3
15 15 231.3 2
>
回答3:
I think Greg's answers cover "standard operating procedure", but I find many uses for the findInterval function as well. It naturally returns a number that identifies the interval in the second argument.
data$int <- findInterval(data$wt, c(179, 200, 300, Inf))
data
回答4:
Just to show an alternate (similar to recode in SPSS) method from package car:
> data$SWT <- with(data, recode(wt, "lo:200=1; 300:hi=3; else=2"))
> data
anim wt SWT
1 1 181.0 1
2 2 179.0 1
3 3 180.5 1
4 4 201.0 2
5 5 201.5 2
6 6 245.0 2
7 7 246.4 2
8 8 189.3 1
9 9 301.0 3
10 10 354.0 3
11 11 369.0 3
12 12 205.0 2
13 13 199.0 1
14 14 394.0 3
15 15 231.3 2
回答5:
Just for completeness and info, the classInt package (on CRAN) is another handy way to classify numbers into classes.
来源:https://stackoverflow.com/questions/6024792/coding-variable-values-into-classes-using-r