I am having trouble figuring out the most elegant and flexible way to switch data from long format to wide format when I have more than one measure variable I want to bring
Note -Sept 2019: within tidyr, the gather()
+spread()
approach (described in this answer) has more or less been replaced by the pivot_wider()
approach (described in `this newer tidyr answer). For current info about the transition, see the pivoting vignette.
Here's a solution with the tidyr package, which has essentially replaced reshape and reshape2. As with those two packages, the strategy it to make the dataset longer first, and then wider.
library(magrittr); requireNamespace("tidyr"); requireNamespace("dplyr")
my.df %>%
tidyr::gather(key=variable, value=value, c(X, Y)) %>% # Make it even longer.
dplyr::mutate( # Create the spread key.
time_by_variable = paste0(variable, "_", TIME)
) %>%
dplyr::select(ID, time_by_variable, value) %>% # Retain these three.
tidyr::spread(key=time_by_variable, value=value) # Spread/widen.
After the tidyr::gather() call, the intermediate dataset is:
ID TIME variable value
1 A 1 X 1
2 B 1 X 2
3 C 1 X 3
...
28 A 5 Y 28
29 B 5 Y 29
30 C 5 Y 30
The eventual result is:
ID X_1 X_2 X_3 X_4 X_5 Y_1 Y_2 Y_3 Y_4 Y_5
1 A 1 4 7 10 13 16 19 22 25 28
2 B 2 5 8 11 14 17 20 23 26 29
3 C 3 6 9 12 15 18 21 24 27 30
tidyr::unite() is an alternative, suggested by @JWilliman. This is functionally equivalent to the dplyr::mutate() and dplyr::select() combination above, when the remove
parameter is true (which is the default).
If you're not accustomed to this type of manipulation, the tidyr::unite()
may be a small obstacle because it's one more function you have to learn & remember. However, it's benefits include (a) more concise code (ie, four lines are replaced by one) and (b) fewer places to repeat variable names (ie, you don't have to repeat/modify variables in the dplyr::select()
clause).
my.df %>%
tidyr::gather(key=variable, value=value, c(X, Y)) %>% # Make it even longer.
tidyr::unite("time_by_variable", variable, TIME, remove=T) %>% # Create the spread key `time_by_variable` while simultaneously dropping `variable` and `TIME`.
tidyr::spread(key=time_by_variable, value=value) # Spread/widen.