I need to reshape data.frame in R in one step. In short, change of values of objects (x1 to x6) is visible row by row (from 1990 to 1995):
> tab1[1:10, ] # raw data see plot for tab1
id value year
1 x1 7 1990
2 x1 10 1991
3 x1 11 1992
4 x1 7 1993
5 x1 3 1994
6 x1 1 1995
7 x2 6 1990
8 x2 7 1991
9 x2 9 1992
10 x2 5 1993
I am able to do reshaping step by step, does anybody know how do it in one step?
Original data Table 1 - see that minimal value from all timeseries is "0"
Step1: Table 2 - rescale each timeseries that each would have minimal value equal "0". All times fall down on x-axes.
Step2:
Table 3 - apply diff()
function on each timeline.
Step3:
Table 4 - apply sort()
function on each timeseries.
I hope the pictures are clear enough for understanding each step.
So final table looks like this:
> tab4[1:10, ]
id value time
1 x1 -4 1
2 x1 -4 2
3 x1 -2 3
4 x1 1 4
5 x1 3 5
6 x2 -4 1
7 x2 -3 2
8 x2 1 3
9 x2 1 4
10 x2 2 5
# Source data:
tab1 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6),
value = c(7,10,11,7,3,1,6,7,9,5,2,3,11,9,7,9,1,
0,1,2,2,4,7,4,2,3,1,6,4,2,3,5,4,3,5,6),
year = rep(c(1990:1995), times = 6))
tab2 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6),
value = c(6,9,10,6,2,0,4,5,7,3,0,1,11,9,7,9,1,0,
0,1,1,3,6,3,1,2,0,5,3,1,0,2,1,0,2,3),
year = rep(c(1990:1995), times = 6))
tab3 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5),
value = c(3,1,-4,-4,-2,1,2,-4,-3,1,-2,-2,2,-8,-1,
1,0,2,3,-3,1,-2,5,-2,-2,2,-1,-1,2,1),
time = rep(c(1:5), times = 6))
tab4 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5),
value = c(-4,-4,-2,1,3,-4,-3,1,1,2,-8,-2,-2,-1,2,
-3,0,1,2,3,-2,-2,-2,1,5,-1,-1,1,2,2),
time = rep(c(1:5), times = 6))
It sounds like you want to apply a set of functions to each group of a grouping variable. There are many ways to do this in R (from base R by
and tapply
to add-on packages like plyr
, data.table
, and dplyr
). I've been learning how to use package dplyr
, and came up with the following solution.
require(dplyr)
tab4 = tab1 %>%
group_by(id) %>% # group by id
mutate(value = value - min(value), value = value - lag(value)) %>% # group min to 0, difference lag 1
na.omit %>% # remove NA caused by lag 1 differencing
arrange(id, value) %>% # order by value within each id
mutate(time = 1:length(value)) %>% # Make a time variable from 1 to 5 based on current order
select(-year) # remove year column to match final OP output
Using data.table
, this is simply:
require(data.table) ## 1.9.2
ans <- setDT(tab1)[, list(value=diff(value)), by=id] ## aggregation
setkey(ans, id,value)[, time := seq_len(.N), by=id] ## order + add 'time' column
Note that your 'step 1' is unnecessary as your second step is calculating difference and it wouldn't have any effect (and is therefore skipped here).
来源:https://stackoverflow.com/questions/24761835/how-to-apply-a-set-of-functions-to-each-group-of-a-grouping-variable-in-r-data-f