问题
I have a dataframe similar to this one
ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
p1 <- c(21000, 23400, 26800, 2345, 23464, 34563, 456433, 56543, 34543,3524, 353, 3432, 4542, 6343, 4534 )
p2 <- c(234235, 2342342, 32, 23432, 23423, 2342342, 34, 2343, 23434, 23434, 34, 234, 2343, 34, 5)
my.df <- data.frame(ID, p1, p2)
Now I would like to scale the values in p1 and p2 depending on their ID. So not the whole column would be scaled like when using the tapply() function, but rather scaling is done once for all values for ID 1, then for all values for ID 2 etc. Same for scaling of p2. The new dataframe should consist of the scaled values.
I already tried
df_scaled <- ddply(my.df, my.df$ID, scale(my.df$p1))
but get the error message
.fun is not a function.
Thanks for your help!
回答1:
dplyr
makes this easy:
ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
p1 <- c(21000, 23400, 26800, 2345, 23464, 34563, 456433, 56543, 34543,3524, 353, 3432, 4542, 6343, 4534 )
p2 <- c(234235, 2342342, 32, 23432, 23423, 2342342, 34, 2343, 23434, 23434, 34, 234, 2343, 34, 5)
my.df <- data.frame(ID, p1, p2)
library(dplyr)
df_scaled <- my.df %>% group_by(ID) %>% mutate(p1 = scale(p1), p2=scale(p2))
Note that there is a bug in the stable version of dplyr
when working with scale; you might need to update to the dev version (see comments).
来源:https://stackoverflow.com/questions/41761018/scale-all-values-depending-on-group