R: Converting wide format to long format with multiple 3 time period variables [duplicate]

后端未结

关注

 3  1949

时光取名叫无心

相关标签:

3条回答

鱼传尺愫

2021-01-25 05:51
If your goal is to convert the three colors to long this can be accomplished with the base R reshape function:
```
reshape(sample.df, idvar="subject", varying=2:length(sample.df), sep="", direction="long")
    Subject time BlueTime RedTime GreenTime subject
1.1       1    1        2       2         2       1
2.1       2    1        5       5         5       2
3.1       3    1        6       6         6       3
1.2       1    2        4       4         4       1
2.2       2    2        6       6         6       2
3.2       3    2        7       7         7       3
1.3       1    3        1       1         1       1
2.3       2    3        2       2         2       2
3.3       3    3        3       3         3       3
```
The time variable captures the 1,2,3 in the names of the wide variables. The varying argument tells reshape which variables should be converted to long. The sep argument tells reshape to look for numbers at the end of the varying variables that are not separated by any characters, while the direction argument tells the function to attempt a long conversion.

I always add the id variable, even if it is not necessary for future reference.

If your data.frame doesn't have actually have the numbers for the time variable, a fairly simple solution is to change the variable names so that they do. For example, the following would replace "_Pre" with "1" at the end of any such variables.
```
names(df)[grep("_Pre$", names(df))] <- gsub("_Pre$", "1",
                                            names(df)[grep("_Pre$", names(df))])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

春和景丽

2021-01-25 06:00

We can use melt from data.table which can take multiple measure columns as a regex pattern

library(data.table)
melt(setDT(sample.df), measure = patterns("^Blue", "^Red", "^Green"), 
     value.name = c("BlueTime", "RedTime", "GreenTime"), variable.name = "time")
#   Subject time BlueTime RedTime GreenTime
#1:       1    1        2       2         2
#2:       2    1        5       5         5
#3:       3    1        6       6         6
#4:       1    2        4       4         4
#5:       2    2        6       6         6
#6:       3    2        7       7         7
#7:       1    3        1       1         1
#8:       2    3        2       2         2
#9:       3    3        3       3         3

Or as @StevenBeaupré mentioned in the comments, if there are many patterns, one option would be to use the names of the dataset after extracting the substring as the patterns argument

melt(setDT(sample.df), measure = patterns(as.list(unique(sub("\\d+", "", 
         names(sample.df)[-1])))),value.name = c("BlueTime", "RedTime", 
          "GreenTime"), variable.name = "time")

0 讨论(0)

猫巷女王i

2021-01-25 06:12

The idea here is to gather() all the time variables (all variables but Subject), use separate() on key to split them into a label and a time and then spread() the label and value to obtain your desired output.

library(dplyr)
library(tidyr)

sample.df %>%
  gather(key, value, -Subject) %>%
  separate(key, into = c("label", "time"), "(?<=[a-z])(?=[0-9])") %>%
  spread(label, value)

Which gives:

#  Subject time BlueTime GreenTime RedTime
#1       1    1        2         2       2
#2       1    2        4         4       4
#3       1    3        1         1       1
#4       2    1        5         5       5
#5       2    2        6         6       6
#6       2    3        2         2       2
#7       3    1        6         6       6
#8       3    2        7         7       7
#9       3    3        3         3       3

Note

Here we use the regex in separate() from this answer by @RichardScriven to split the column on the first encountered digit.

Edit

I understand from your comments that your dataset column names are actually in the form ColorTime_Pre, ColorTime_Post, ColorTime_Final. If that is the case, you don't have to specify a regex in separate() as the default one sep = "[^[:alnum:]]+" will match your _ and split the key into label and time accordingly:

sample.df %>%
  gather(key, value, -Subject) %>%
  separate(key, into = c("label", "time")) %>%
  spread(label, value)

Will give:

#  Subject  time BlueTime GreenTime RedTime
#1       1 Final        1         1       1
#2       1  Post        4         4       4
#3       1   Pre        2         2       2
#4       2 Final        2         2       2
#5       2  Post        6         6       6
#6       2   Pre        5         5       5
#7       3 Final        3         3       3
#8       3  Post        7         7       7
#9       3   Pre        6         6       6

0 讨论(0)

热议问题