问题
I would really appreciate your help in this question. I have the following dataset and I would like to create a new variable which would contain the standardized values (z distribution) per level of a given factor variable.
x <- data.frame(gender = c("boy","boy","boy","girl","girl","girl"),
values=c(1,2,3,6,7,8))
x
gender values
1 boy 1
2 boy 2
3 boy 3
4 girl 6
5 girl 7
6 girl 8
My aim is to create one new variable which will contain the z-values calculated separately for each factor level (for boys and for girls).
And another question. I mainly would like to create a variable with the z-values. Would it be similar if I would like to apply another function and for example calculate distribution in quantiles per factor level?
Thank you for your help!
回答1:
You can use scale
with ave
and transform
:
> transform(x, z_score=ave(values, gender, FUN=scale))
gender values z_score
1 boy 1 -1
2 boy 2 0
3 boy 3 1
4 girl 6 -1
5 girl 7 0
6 girl 8 1
aggregate
is also useful
> aggregate(values ~ gender, scale, data=x)
And there are a lot of ways using ddply
from plyr, tapply
, data.table
. Take a look at this post
回答2:
The question how to create z scores has already been answered.
Here's a way to calculate quantiles for each factor level:
with(x, tapply(values, gender, FUN = quantile))
# $boy
# 0% 25% 50% 75% 100%
# 1.0 1.5 2.0 2.5 3.0
#
# $girl
# 0% 25% 50% 75% 100%
# 6.0 6.5 7.0 7.5 8.0
来源:https://stackoverflow.com/questions/20745120/how-to-scale-a-variable-by-group